Video Semantic Segmentation Based on Spatiotemporal Dual Branch Attention
-
Graphical Abstract
-
Abstract
Video semantic segmentation methods are widely used in autonomous driving, traffic management, drone driving and other fields. Aiming at the problems of inaccurate segmentation results and too long segmentation time to achieve real-time performance in video semantic segmentation, a memory network segmentation algorithm based on spatio-temporal dual-branch attention mechanism is proposed. The algorithm is mainly composed of memory storage module, feature integration module and spatio-temporal dual branch attention module. The Historical frame feature information and the sequence time information are stored in the memory storage module. The feature integration module uses large-core decomposition convolution to expand the convolution receptive field without increasing the amount of calculation, and captures long-distance dependent information and context information well to achieve channel compression and feature reduction. The spatio-temporal dual-branch attention module captures the feature information of the same object in adjacent frames and performs feature fusion, using the timing of video frames to improve segmentation accuracy. The algorithm achieves 76.92% and 73.68% mIoU on the public datasets Cityscapes and CamVid, and the calculation speed reaches 38 frames/s. The experiment proves that this method has reached the optimal state in segmentation accuracy and calculation efficiency compared with other video semantic segmentation algorithms.
-
-