基于事件相机的图像语义分割方法研究

王超毅; 于男男; 乔羽; 任健康; 周东生; 魏小鹏; 张强; 杨鑫

doi:10.3724/SP.J.1089.2023-00698

摘要: 　　图像语义分割技术是计算机图像处理领域中一项重要的研究内容, 广泛应用于自动驾驶、医学影像等领域.随着深度学习的飞速发展, 基于RGB相机的图像语义分割技术取得了较为显著的发展, 但面临实际场景中过曝光、低光照和高速运动物体时, 语义分割技术仍然存在挑战, 如图像信息在上述场景中会出现不同程度的丢失（恶劣光照干扰下RGB相机无法提供充足RGB纹理和颜色信息来保证语义分割的效果）与扰动（RGB相机拍摄运动速度过快的物体的图像会包含大量的运动模糊）. 与RGB相机的成像原理不同, 事件相机作为一款新颖的仿生视觉传感器, 其在拍摄时仅捕捉某像素光照强度的变化情况, 生成异步事件数据. 由于事件相机具有高动态范围、高响应速度、低功耗等特点, 在过曝光和低光照等挑战场景下依然可以有效的成像, 且不会产生运动模糊, 因此事件相机的引入为真实挑战场景下的图像语义分割任务提供了新的解决思路. 然而, 基于事件相机的语义分割数据集匮乏, 且制作一个高精度标注的语义分割数据集需要耗费巨大的人力物力, 因此现有的基于事件的图像语义分割数据集通常使用深度学习算法预测的分割结果作为语义标签, 而此类标签受算法性能影响很容易产生偏差. 针对上述问题, 本文通过仿真环境制作了一个大规模的、高精度标注的事件相机图像语义分割数据集: Carla-Semantic, 该数据集提供了同步的RGB图像、事件帧图像和准确的像素级语义标签. 此外, 考虑到事件数据分布不均、部分区域事件数据较为稀疏的特点, 本文设计了基于事件数据的图像语义分割方法(Event-Based Image Semantic Segmentation, EVISS), 该方法可以从全局的角度进一步增强事件表征, 加强图像各像素点的全局联系性与上下文依赖, 并通过改进的图拉普拉斯公式, 引入与位置无关的注意力对角矩阵来更好的捕捉远距离的依赖关系, 从而更好的提取高层级事件特征. 通过在自制数据集Carla-Semantic上进行实验, EVISS与Ev-SegNet相比, 在MPA评价指标上提高了1.85%, 在mIoU评价指标上提高了1.68%.
　　关键词: 人工智能; 硅视网膜; 仿生视觉; 图学习; 图像语义分割; 事件相机中图法分类号: TP391.41 DOI: 10.3724/SP.J.1089.202*.论文编号

Abstract: The image semantic segmentation task is essential for computer vision, and it is applied to various fields such as obstacle avoidance for drones, autonomous driving, medical imaging and so on. Although it has been developed based on RGB cameras, there are still some significant challenges that cannot be ignore in actual scenes. For example, it is difficult for RGB cameras to image effectively in over-exposure and low-light scenes and cannot provide sufficient semantic information. RGB cameras will leads motion blur where fast-moving objects in the scene due to the sampling method, pose challenges for semantic segmentation task. Event camera, a novel bionic vision sensor, is different from the imaging principle of traditional RGB cameras, which captures changes in the light intensity of pixels and generates event data asynchronously. With the advantages of high dynamic range, high response speed, and low power consumption, event camera can image effectively in challenging scenes such as overexposure and low light without motion blur. Therefore, it can provide a new solution for semantic segmentation tasks in realistic challenging scenarios. However, there is lacking semantic segmentation datasets based on event cameras, and it takes a lot of manpower and material resources to create an image semantic segmentation dataset with high-quality annotations. To solve this, existing event-based semantic segmentation datasets utilize deep learning algorithms to predict results as semantic labels. The pseudo-class label is not accurate enough due to the performance of the algorithm. To solve these above problems, this paper creates a large-scale event camera image semantic segmentation dataset named Carla-Semantic, which provides synchronized RGB images, event frame images, and accurate pixel-level semantic labels for image semantic segmentation tasks. In addition, considering the uneven distribution and the sparseness of event data in some areas, we design an event-based image semantic segmentation network named EVISS, which can further enhance event feature representation from a global perspective and strengthen the global connection of each point in the image. With the improved graph Laplacian formula, we introduce the diagonal matrix of the position-independent attention mechanism, which can capture the long-distance context relationship better, so as to extract high-level event features better. We make experiments on the Carla-Semantic dataset to evaluate our method. Compared to Ev-SegNet, the proposed method in this paper achieves an improvement of 1.85% in the MPA metric and 1.68% in the mIoU metric.