高级检索
万磊, 李华锋, 张亚飞. 多模态特征融合和自蒸馏的红外-可见光行人重识别[J]. 计算机辅助设计与图形学学报.
引用本文: 万磊, 李华锋, 张亚飞. 多模态特征融合和自蒸馏的红外-可见光行人重识别[J]. 计算机辅助设计与图形学学报.
Infrared-Visible Person Re-identification Via Multi-modality Feature Fusion and Self-distillation[J]. Journal of Computer-Aided Design & Computer Graphics.
Citation: Infrared-Visible Person Re-identification Via Multi-modality Feature Fusion and Self-distillation[J]. Journal of Computer-Aided Design & Computer Graphics.

多模态特征融合和自蒸馏的红外-可见光行人重识别

Infrared-Visible Person Re-identification Via Multi-modality Feature Fusion and Self-distillation

  • 摘要: 现有跨模态行人重识别方法大多挖掘模态不变的特征, 忽略了不同模态内的具有判别性的自有特征. 为了充分利用不同模态内的自有特征, 提出一种多模态特征融合和自蒸馏的红外-可见光行人重识别方法. 具体地, 提出一种基于双分类器的注意力融合机制, 将各模态的自有特征赋予较大的融合权重, 共有特征赋予较小的融合权重, 从而得到含有各模态判别性自有特征的多模态融合特征; 同时, 为了提升网络所提特征的鲁棒性, 以适应行人外观的变化, 构建了一个记忆存储器来存储行人的多视角特征; 此外, 设计了一种自蒸馏无参数动态引导策略, 在多模态融合特征和多视角特征的引导下, 利用该策略动态强化网络的多模态和多视角推理能力; 最终, 网络能够从一个行人的单模态图像推理出另一模态不同视角行人特征, 从而提升模型跨模态行人重识别的性能. 基于PyTorch深度学习框架, 在公开数据集SYSU-MM01和RegDB上Rank-1分别达到了63.12%和92.55%, mAP分别达到了61.51%和89.55%, 结果表明所提方法优于对比方法.

     

    Abstract: Most of the existing cross-modality person re-identification methods mine modality-invariant features, while ignoring the discriminative self-owned features in different modalities. In order to fully mine the born features in different modalities, an infrared-visible person re-identification method via multi-modality feature fusion and self-distillation is proposed. Specifically, we propose an attention fusion mechanism based on a dual classifier, which assigns a larger fusion weight to the self-owned features of each modality, and conversely, a smaller weight to the common features, so as to obtain the multi-modality fusion features containing each modality’s discriminative owned features. At the same time, in order to improve the robustness of the features extracted by the network and adapt to the changes of pedestrian appearance, a memory storage is constructed to store the multi-view features of pedestrians. In addition, a parameter-free dynamic guidance strategy for self-distillation is designed. Under the guidance of multi-modality fusion features and multi-view features, this strategy is used to dynamically strengthen the multi-modality and multi-view reasoning capabilities of the network. Finally, the network is able to infer the features of pedestrians with different views of another modality from the single-modality image of a pedestrian, thus, improving the performance of the model for cross-modality person re-identification. Based on the PyTorch deep learning framework, Rank-1 reaches 63.12% and 92.55% respectively on the public dataset SYSU-MM01 and RegDB, and mAP reaches 61.51% and 89.55% respectively. The experimental results prove that the proposed method is better than the comparison methods.

     

/

返回文章
返回