Combination of Axial Enhanced Transformer and CNN Network for medical image segmentation

Xiong Lankun; Zhang Guimei; Liu Huiqun; Ma Shangke

doi:10.3724/SP.J.1089.2024-00356

Xiong Lankun, Zhang Guimei, Liu Huiqun, Ma Shangke. Combination of Axial Enhanced Transformer and CNN Network for medical image segmentation[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2024-00356

Citation:

Combination of Axial Enhanced Transformer and CNN Network for medical image segmentation

Graphical Abstract

Graphical Abstract

Abstract

Abstract

The hybrid model, which combines Swin Transformer and CNN, has demonstrated its effectiveness in medical image segmentation. However, there exists semantic gaps between the features extracted from the two networks within the hybrid model, leading to unsatisfactory segmentation accuracy when directly fusing these features. Moreover, it is observed that the Swin Transformer lacks pixel-level modeling capability within patches. To address these challenges, we propose a novel method for medical image segmentation that integrates axial enhanced Transformer and CNN double encoder. In order to bridge the semantic gap between features, our method introduces a new feature fusion module during the coding stage. Additionally, we leverage cross-fusion techniques along with spatial channel attention and cross-domain enhancement modules to effectively merge the features extracted from both networks. The objective of these measures is to ensure semantic consistency and effectiveness, ultimately enhancing the model’s expressiveness. To address the issue of limited pixel-level modeling ability in Swin Transformer, an axial enhancement transformer encoder is employed to capture correlations between pixels in both height and width dimensions. This significantly improves the model’s pixel-level modeling capability, resulting in enhanced segmentation accuracy. Experiments are conducted on four medical image datasets, namely GlaS, MoNuSeg, JSRT and ISIC2018, and compared with various mainstream segmentation models. The experimental results demonstrate that our proposed model achieves optimal Dice, IoU, precision, and recall across diverse datasets. Furthermore, it can be utilized for segmenting a wide range of medical images.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

Combination of Axial Enhanced Transformer and CNN Network for medical image segmentation

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content