Advanced Search
Xiong Lankun, Zhang Guimei, Liu Huiqun, Ma Shangke. Combination of Axial Enhanced Transformer and CNN Network for medical image segmentation[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2024-00356
Citation: Xiong Lankun, Zhang Guimei, Liu Huiqun, Ma Shangke. Combination of Axial Enhanced Transformer and CNN Network for medical image segmentation[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2024-00356

Combination of Axial Enhanced Transformer and CNN Network for medical image segmentation

  • The hybrid model, which combines Swin Transformer and CNN, has demonstrated its effectiveness in medical image segmentation. However, there exists semantic gaps between the features extracted from the two networks within the hybrid model, leading to unsatisfactory segmentation accuracy when directly fusing these features. Moreover, it is observed that the Swin Transformer lacks pixel-level modeling capability within patches. To address these challenges, we propose a novel method for medical image segmentation that integrates axial enhanced Transformer and CNN double encoder. In order to bridge the semantic gap between features, our method introduces a new feature fusion module during the coding stage. Additionally, we leverage cross-fusion techniques along with spatial channel attention and cross-domain enhancement modules to effectively merge the features extracted from both networks. The objective of these measures is to ensure semantic consistency and effectiveness, ultimately enhancing the model’s expressiveness. To address the issue of limited pixel-level modeling ability in Swin Transformer, an axial enhancement transformer encoder is employed to capture correlations between pixels in both height and width dimensions. This significantly improves the model’s pixel-level modeling capability, resulting in enhanced segmentation accuracy. Experiments are conducted on four medical image datasets, namely GlaS, MoNuSeg, JSRT and ISIC2018, and compared with various mainstream segmentation models. The experimental results demonstrate that our proposed model achieves optimal Dice, IoU, precision, and recall across diverse datasets. Furthermore, it can be utilized for segmenting a wide range of medical images.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return