高级检索
冯子航, 闫莉萍, 白景岚, 夏元清, 肖波. 显著内容感知的深度特征融合RGB-T目标跟踪算法[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.20140
引用本文: 冯子航, 闫莉萍, 白景岚, 夏元清, 肖波. 显著内容感知的深度特征融合RGB-T目标跟踪算法[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.20140
Zihang Feng, Liping Yan, Jinglan Bai, Yuanqing Xia, Bo Xiao. RGB-T Target Tracking Algorithm with Salient Content Perception and Deep Feature Fusion[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.20140
Citation: Zihang Feng, Liping Yan, Jinglan Bai, Yuanqing Xia, Bo Xiao. RGB-T Target Tracking Algorithm with Salient Content Perception and Deep Feature Fusion[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.20140

显著内容感知的深度特征融合RGB-T目标跟踪算法

RGB-T Target Tracking Algorithm with Salient Content Perception and Deep Feature Fusion

  • 摘要: 为解决当前可见光和红外(RGB-thermal-infrared, RGB-T)视频的目标跟踪方法中目标矩形框不能很好地指明目标形状, 使得跟踪器参数训练未能充分关注目标部分, 以及单层深度特征难以兼顾类别语义信息和空间结构信息的问题, 提出一种显著内容感知的深度特征融合RGB-T目标跟踪算法. 首先针对可见光和红外2种模态,提取并融合目标的显著图; 然后根据融合显著图优化空间正则项权重系数图, 加强显著区域中训练样本对滤波器训练的影响; 最后采用预训练的卷积神经网络分别提取2种模态的多层深度特征, 这些特征包含了丰富的类别语义信息和空间结构信息并在响应图阶段融合. 在RGB-T跟踪数据集GTOT和RGBT210上的实验结果表明, 该算法在跟踪精度上达到88.4%和72.7%, 在跟踪成功率上达到71.9%和51.0%, 与现有算法结果对比, 验证了该算法的有效性.

     

    Abstract: For current RGB-thermal-infrared (RGB-T) video tracking methods, the bounding box can not properly describe the target shape, which induces the parameter training not fully focus on the target area. In the aspect of feature representation, the single-layer deep learning features have difficulty in balancing both category semantic information and spatial structure information. Therefore, an RGB-T tracking algorithm with salient content perception and deep feature fusion is proposed in this article. Firstly, for the two modalities visible spectrum and thermal-infrared spectrum, the salient maps of the target are extracted and fused. Secondly, the fused salient map is used to optimize the weighting coefficient map of the spatial regularization term to highlight the influence of the training samples in the salient content region on the classifier training. Finally, the pre-trained convolution neural network is used to extract the multi-layer features of the two modalities. These features contain abundant information of sematic category and spatial structure, which are fused at the response level. Compared to the existing tracking algorithms, experimental results on the two RGB-T tracking datasets GTOT and RGBT210 demonstrate the effectiveness of the proposed algorithm. The proposed algorithm achieves the precision rates of 88.4% and 72.7%, respectively, while obtains the success rates of 71.9% and 51.0%.

     

/

返回文章
返回