基于三维跨模态ConvFormer的肺部肿瘤识别
3D Cross-Modal ConvFormer for Lung Cancer Recognition
-
摘要: 针对三维医学影像因肺部肿瘤形状不规则、差异性大, 导致特征提取不充分和识别不准确的问题, 提出一种基于CNN和Transformer的三维跨模态肺部肿瘤识别模型3D-CConvFormer. 首先, 利用三分支网络学习三维PET、CT和PET/CT影像中病灶的特征; 其次, 设计全局特征与浅层局部特征融合的高效ConvFormer模块, 并利用自校正注意力卷积对感受野进行有效扩展, 提高每个模态中对病灶信息的提取能力; 最后, 设计双分支不同分辨率的跨模态特征交互块, 利用两2个全局注意力机制交叉学习不同模态、全局和局部信息, 交互式地的增强跨模态特征提取能力和并捕获三维多模态优势信息. 实验采用的肺部肿瘤3D多模态数据集,该数据集共有3 173例患者, 3D-CConvFormer模型在参数量和运行时间较优的前提下, 获得了89.25%准确率和88.74%AUC值的最优性能, 为三维多模态肺部肿瘤疾病诊断提供可靠的计算机辅助.Abstract: Due to the irregular shape and large difference of lung tumors in 3D medical images, the feature extraction of lesions is insufficient and the recognition accurancy is not high, a 3D cross-modal lung tumor recognition model 3D-CConvFormer based on CNN and Transformer is proposed. Firstly, three Nets are utilized to learn the 3D PET, CT and PET/CT medical images. Secondly, a ConvFormer model is designed to fuse global features and shallow local features, and self-correcting convolution is utilized to effectively extend the receptive field to improve the extraction ability of lesion information in each modality. Finally, a dual-branch cross-modal feature interaction block with different resolutions is designed to interactively enhance cross-modal features and capture 3D multimodal detail information, The module can interactively enhance cross-modal features extracting ability using two global attention mechanisms to cross-learn different modal, global-local information. The 3D multimodal dataset of lung tumor is used in the experiments, with a total of 3 173 patients. Under the premise of better parameters and computation time, the accuracy of 89.25% and the AUC value of 88.74% are obtained by the 3D-CConvFormer model, which provided reliable computer-aided diagnosis for three-dimensional multi-mode lung tumor disease.