高级检索
黄丹丹, 王一雯, 陈广秋, 胡奇, 于斯宇, 崔志瑜. 基于时空双分支注意力的视频语义分割[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.2023-00046
引用本文: 黄丹丹, 王一雯, 陈广秋, 胡奇, 于斯宇, 崔志瑜. 基于时空双分支注意力的视频语义分割[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.2023-00046
DanDan HUANG, YiWen WANG, Guangqiu CHEN, Qi HU, SiYu YU, ZHIYu CUI. Video Semantic Segmentation Based On Spatiotemporal Dual Branch Attention[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2023-00046
Citation: DanDan HUANG, YiWen WANG, Guangqiu CHEN, Qi HU, SiYu YU, ZHIYu CUI. Video Semantic Segmentation Based On Spatiotemporal Dual Branch Attention[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2023-00046

基于时空双分支注意力的视频语义分割

Video Semantic Segmentation Based On Spatiotemporal Dual Branch Attention

  • 摘要: 视频语义分割广泛应用在自动驾驶、交通管理、无人机驾驶等领域. 针对视频语义分割中分割结果不精确和分割时间过长不能达到实时性等问题, 本文提出了基于时空双分支注意力机制的记忆网络分割算法, 该算法主要由记忆存储模块, 特征整合模块和时空双分支注意力模块三部分构成. 记忆存储模块存储历史帧特征信息和序列时间信息. 特征整合模块引用大核分解卷积在不增加计算量的基础上扩大卷积感受野, 很好的捕获长距离依赖信息和上下文信息, 达到信道压缩和特征缩减的作用. 时空双分支注意力模块捕捉相邻帧中同一物体的特征信息并进行特征融合, 利用视频帧时序性提高分割精度. 算法在公开数据集Cityscapes 和Camvid上mIou达到了76.92%和73.68%, 计算速度达到38帧/s. 实验证明本方法相较于其他视频语义分割算法在分割精度和计算效率达到了最优状态.

     

    Abstract: Video semantic segmentation methods are widely used in autonomous driving, traffic management, drone driving and other fields.Aiming at the problems of inaccurate segmentation results and too long segmentation time to achieve real-time performance in video semantic segmentation, a memory network segmentation algorithm based on spatio-temporal dual-branch attention mechanism is proposed.The algorithm is mainly composed of memory storage module, feature integration module and spatio-temporal dual branch attention module.The Historical frame feature information and the sequence time information are stored in the memory storage module.The feature integration module uses large-core decomposition convolution to expand the convolution receptive field without increasing the amount of calculation, and captures long-distance dependent information and context information well to achieve channel compression and feature reduction.The spatio-temporal dual-branch attention module captures the feature information of the same object in adjacent frames and performs feature fusion, using the timing of video frames to improve segmentation accuracy.The algorithm achieves 76.92% and 73.68% mIou on the public datasets Cityscapes and Camvid, and the calculation speed reaches 38 frames/s. The experiment proves that this method has reached the optimal state in segmentation accuracy and calculation efficiency compared with other video semantic segmentation algorithms.

     

/

返回文章
返回