MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation

被引:0
|
作者
Liu, Jun [1 ]
Li, Kunqi [1 ]
Huang, Chun [1 ]
Dong, Hua [1 ]
Song, Yusheng [2 ]
Li, Rihui [3 ,4 ]
机构
[1] Nanchang Hangkong Univ, Dept Informat Engn, Nanchang 330063, Jiangxi, Peoples R China
[2] Peoples Hosp Ganzhou, Dept Intervent Radiol, Ganzhou 341000, Jiangxi, Peoples R China
[3] Univ Macau, Inst Collaborat Innovat, Ctr Cognit & Brain Sci, Macau, Peoples R China
[4] Univ Macau, Fac Sci & Technol, Dept Elect & Comp Engn, Macau, Peoples R China
基金
中国国家自然科学基金;
关键词
Image segmentation; Transformers; Feature extraction; Semantics; Decoding; Computational modeling; Medical diagnostic imaging; Computer architecture; Computer vision; Convolutional neural networks; Medical image segmentation (SEG); mixed convolutional neural network (CNN)-Transformer backbone; mixed multibranch dilated attention (MMDA); multiscale spatial-aware fusion (MSAF); ATTENTION;
D O I
10.1109/TIM.2024.3497060
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Transformers using self-attention mechanisms have recently advanced medical imaging by modeling long-range semantic dependencies, though they lack the ability of convolutional neural networks (CNNs) to capture local spatial details. This study introduced a novel segmentation (SEG) network derived from a mixed CNN-Transformer (MixFormer) feature extraction backbone to enhance medical image segmentation. The MixFormer network seamlessly integrates global and local information from Transformer and CNN architectures during the downsampling process. To comprehensively capture the interscale perspective, we introduced a multiscale spatial-aware fusion (MSAF) module, enabling effective interaction between coarse and fine feature representations. In addition, we proposed a mixed multibranch dilated attention (MMDA) module to bridge the semantic gap between encoding and decoding stages while emphasizing specific regions. Finally, we implemented a CNN-based upsampling approach to recover low-level features, substantially improving segmentation accuracy. Experimental validations on prevalent medical image datasets demonstrated the superior performance of MixFormer. On the Synapse dataset, our approach achieved a mean Dice similarity coefficient (DSC) of 82.64% and a mean Hausdorff distance (HD) of 12.67 mm. On the automated cardiac diagnosis challenge (ACDC) dataset, the DSC was 91.01%. On the international skin imaging collaboration (ISIC) 2018 dataset, the model achieved a mean intersection over union (mIoU) of 0.841, an accuracy of 0.958, a precision of 0.910, a recall of 0.934, and an F1 score of 0.913. For the Kvasir-SEG dataset, we recorded a mean Dice of 0.9247, an mIoU of 0.8615, a precision of 0.9181, and a recall of 0.9463. On the computer vision center (CVC)-ClinicDB dataset, the results were a mean Dice of 0.9441, an mIoU of 0.8922, a precision of 0.9437, and a recall of 0.9458. These findings underscore the superior segmentation performance of MixFormer compared to most mainstream segmentation networks such as CNNs and other Transformer-based structures.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Semhybridnet: a semantically enhanced hybrid CNN-transformer network for radar pulse image segmentation
    Hongjia Liu
    Yubin Xiao
    Xuan Wu
    Yuanshu Li
    Peng Zhao
    Yanchun Liang
    Liupu Wang
    You Zhou
    Complex & Intelligent Systems, 2024, 10 : 2851 - 2868
  • [22] MFH-Net: A Hybrid CNN-Transformer Network Based Multi-Scale Fusion for Medical Image Segmentation
    Wang, Ying
    Zhang, Meng
    Liang, Jian'an
    Liang, Meiyan
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (06)
  • [23] FFSwinNet: CNN-Transformer Combined Network With FFT for Shale Core SEM Image Segmentation
    Feng, Yilong
    Jia, Lijuan
    Zhang, Jinchuan
    Chen, Junqi
    IEEE ACCESS, 2024, 12 : 73021 - 73032
  • [24] AFC-Unet: Attention-fused full-scale CNN-transformer unet for medical image segmentation
    Meng, Wenjie
    Liu, Shujun
    Wang, Huajun
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 99
  • [25] HCT-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation
    Yu, Zhihong
    Lee, Feifei
    Chen, Qiu
    APPLIED INTELLIGENCE, 2023, 53 (17) : 19990 - 20006
  • [26] Semantic segmentation of terrace image regions based on lightweight CNN-Transformer hybrid networks
    Liu X.
    Yi S.
    Li L.
    Cheng X.
    Wang C.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2023, 39 (13): : 171 - 181
  • [27] HCT-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation
    Zhihong Yu
    Feifei Lee
    Qiu Chen
    Applied Intelligence, 2023, 53 : 19990 - 20006
  • [28] SEGTRANSVAE: HYBRID CNN - TRANSFORMER WITH REGULARIZATION FOR MEDICAL IMAGE SEGMENTATION
    Quan-Dung Pham
    Hai Nguyen-Truong
    Nam Nguyen Phuong
    Nguyen, Khoa N. A.
    Nguyen, Chanh D. T.
    Bui, Trung
    Truong, Steven Q. H.
    2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
  • [29] An effective CNN and Transformer complementary network for medical image segmentation
    Yuan, Feiniu
    Zhang, Zhengxiao
    Fang, Zhijun
    PATTERN RECOGNITION, 2023, 136
  • [30] From CNN to Transformer: A Review of Medical Image Segmentation Models
    Yao, Wenjian
    Bai, Jiajun
    Liao, Wei
    Chen, Yuheng
    Liu, Mengjuan
    Xie, Yao
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024, 37 (04): : 1529 - 1547