基于三维跨模态ConvFormer的肺部肿瘤识别
3D Cross-Modal ConvFormer for Lung Cancer Recognition
-
摘要: 针对三维医学影像因肺部肿瘤形状不规则、差异性大, 导致特征提取不充分和识别不准确的问题, 提出一种基于CNN和Transformer的三维跨模态肺部肿瘤识别模型3D-CConvFormer.首先, 利用三分支网络学习三维PET, CT和PET/CT影像中病灶的特征; 其次, 设计全局特征与浅层局部特征融合的高效ConvFormer模块, 并利用自校正卷积对感受野进行有效扩展, 提高每个模态中对病灶信息的提取能力; 最后, 设计双分支不同分辨率的跨模态特征交互块, 利用2个全局注意力机制交叉学习不同模态、全局和局部信息, 交互式地增强跨模态特征提取能力.实验采用的肺部肿瘤3D多模态数据集, 该数据集共有3 173例患者, 3D-CConvFormer模型在参数量和运行时间较优的前提下, 获得了89.25%的准确率和88.74%的AUC值的最优性能, 为三维多模态肺部肿瘤疾病诊断提供可靠的计算机辅助.Abstract: Due to the irregular shape and large difference of lung tumors in 3D medical images, the feature extraction of lesions is insufficient, and the recognition accuracy is not high, a 3D Cross-Modal ConvFormer is proposed. Firstly, three networks are utilized to learn the 3D PET, CT and PET/CT medical images. Secondly, a ConvFormer model is designed to fuse global and shallow local features, while self-correcting convolution expands the receptive field for better lesion extraction. Finally, a dual-branch cross-modal feature interaction block is designed to enhance cross-modal features and capture 3D multimodal details. This module uses two global attention mechanisms to improve the extraction of cross-modal and global-local information. The experiments use a 3D multimodal lung tumor dataset with 3 173 patients. With optimized parameters and computation time, the 3D-CConvFormer achieves an accuracy of 89.25% and an AUC of 88.74%, providing reliable computer-aided diagnosis for 3D multimodal lung tumor.