高级检索
万磊, 李华锋, 张亚飞. 多模态特征融合和自蒸馏的红外-可见光行人重识别[J]. 计算机辅助设计与图形学学报, 2024, 36(7): 1065-1076. DOI: 10.3724/SP.J.1089.2024.19886
引用本文: 万磊, 李华锋, 张亚飞. 多模态特征融合和自蒸馏的红外-可见光行人重识别[J]. 计算机辅助设计与图形学学报, 2024, 36(7): 1065-1076. DOI: 10.3724/SP.J.1089.2024.19886
Wan Lei, Li Huafeng, Zhang Yafei. Infrared-Visible Person Re-Identification via Multi-Modality Feature Fusion and Self-Distillation[J]. Journal of Computer-Aided Design & Computer Graphics, 2024, 36(7): 1065-1076. DOI: 10.3724/SP.J.1089.2024.19886
Citation: Wan Lei, Li Huafeng, Zhang Yafei. Infrared-Visible Person Re-Identification via Multi-Modality Feature Fusion and Self-Distillation[J]. Journal of Computer-Aided Design & Computer Graphics, 2024, 36(7): 1065-1076. DOI: 10.3724/SP.J.1089.2024.19886

多模态特征融合和自蒸馏的红外-可见光行人重识别

Infrared-Visible Person Re-Identification via Multi-Modality Feature Fusion and Self-Distillation

  • 摘要: 现有跨模态行人重识别方法大多挖掘模态不变的特征,忽略了不同模态内的具有判别性的自有特征.为了充分地利用不同模态内的自有特征,提出一种多模态特征融合和自蒸馏的红外-可见光行人重识别方法.首先提出一种基于双分类器的注意力融合机制,为各模态的自有特征赋予较大的融合权重,共有特征赋予较小的融合权重,得到含有各模态判别性自有特征的多模态融合特征;为了提升网络特征的鲁棒性以适应行人外观的变化,构建一个记忆存储器来存储行人的多视角特征;还设计了一种自蒸馏无参数动态引导策略,在多模态融合特征和多视角特征的引导下,利用该策略动态强化网络的多模态推理和多视角推理能力;最后网络能够从一个行人的单模态图像推理出另一模态不同视角行人特征,提升模型跨模态行人重识别的性能.基于PyTorch深度学习框架,在公开数据集SYSU-MM01和RegDB上与当前主流的方法进行对比实验,结果表明,所提方法的Rank-1分别达到63.12%和92.55%,mAP分别达到61.51%和89.55%,优于对比方法.

     

    Abstract: Most existing cross-modality person re-identification methods mine modality-invariant features, while ignoring the discriminative features inherent to each modality. To fully utilize the inherent features in different modalities, an infrared-visible person re-identification method via multi-modality feature fusion and self-distillation is proposed. Firstly, an attention fusion mechanism based on a dual classifier is proposed. This mechanism assigns greater fusion weights to the self-owned features of each modality, while conversely assigning lesser weights to the common features. This approach aims to obtain multi-modality fusion features that encapsulate the discriminative self-owned features of each modality. To enhance the robustness of network feature in adjusting to changes of pedestrian appearance, a memory storage is constructed to store the multi-view features of pedestrians. A parameter-free dynamic guidance strategy for self-distillation is also designed. This strategy aims to dynamically reinforce the multi-modality and multi-view reasoning capabilities of the network under the guidance of multi-modality fusion features and multi-view features. Finally, the network is able to infer the features of a pedestrian with different views of another modality from its single-modality image, thus improving the performance of the model for cross-modality person re-identification. Based on the PyTorch deep learning framework, comparative experiments are conducted with current main-stream methods on the public datasets SYSU-MM01 and RegDB. The results demonstrate that the proposed method achieves Rank-1 accuracies of 63.12% and 92.55%, respectively, along with mAP scores of 61.51% and 89.55%, respectively, which is superior to the comparison methods.

     

/

返回文章
返回