HCTA-Net: A Hybrid CNN-Transformer Attention Network for Surgical Instrument Segmentation

被引：2

作者：

Yang, Lei ^{[1
]}

Wang, Hongyong ^{[1
]}

Bian, Guibin ^{[1
,2
]}

Liu, Yanhong ^{[1
]}

机构：

[1] Zhengzhou Univ, Sch Elect Engn, Zhengzhou 450001, Henan, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS | 2023年 / 5卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Image segmentation; Feature extraction; Instruments; Transformers; Task analysis; Surgery; Robots; Surgical instruments; Deep architecture; Medical robotics; surgical instrument segmentation; transformer; residual network; deep supervision; FEATURE AGGREGATION; IMAGES;

D O I：

10.1109/TMRB.2023.3315479

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Surgical robots nowadays have an increasingly important role in surgery, and the accurate surgical instrument segmentation is one of important prerequisites for their stable operations. However, this task is against with some challenging factors, such as scaling transformation, specular reflection, etc. Recently, transformer has shown their superior segmentation performance in the field of image segmentation, which has a strong remote dependence detection capability. However, it could not well capture locality and translation invariance. In this paper, taking the advantages of transformer and CNN, a hybrid CNN-Transformer attention network, named HCTA-Net, is proposed for automatic surgical instrument segmentation. To be able to better extract more comprehensive feature information from surgical images, a dual-path encoding unit is proposed for effective feature representation of local detail feature and global contexts. Meanwhile, an attention-based feature enhancement (AFE) module is proposed for feature complementary of dual-path encoding networks. In addition, to mitigate the issue of limited processing capacity associated with simple connections, a multi-dimension attention (MDA) module is built to process the intermediate features from three directions, including width, height and space, to filter the interference features while emphasizing the key feature regions of local feature maps. Further, an additive attention enhancement (AAE) module is introduced for further feature enhancement of local feature maps. Finally, in order to obtain more multi-scale global information, a multi-scale context fusion (MCF) module is proposed at the bottleneck layer to obtain different receptive fields to enrich feature representation. Experimental results show that proposed HCTA-Net network can achieve superior segmentation performance on surgical instruments compared to other state-of-the-art (SOTA) segmentation models.

引用

页码：929 / 944

页数：16

共 50 条

[31] CT-Net: an interpretable CNN-Transformer fusion network for fNIRS classification
Liao, Lingxiang
Lu, Jingqing
Wang, Lutao
Zhang, Yongqing
Gao, Dongrui
Wang, Manqing
MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (10) : 3233 - 3247
[32] PFormer: An efficient CNN-Transformer hybrid network with content-driven P-attention for 3D medical image segmentation
Gao, Yueyang
Zhang, Jinhui
Wei, Siyi
Li, Zheng
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 101
[33] A synergistic CNN-transformer network with pooling attention fusion for hyperspectral image classification
Chen, Peng
He, Wenxuan
Qian, Feng
Shi, Guangyao
Yan, Jingwen
DIGITAL SIGNAL PROCESSING, 2025, 160
[34] Hybrid CNN-Transformer Network for Electricity Theft Detection in Smart Grids
Bai, Yu
Sun, Haitong
Zhang, Lili
Wu, Haoqi
SENSORS, 2023, 23 (20)
[35] Hybrid CNN-transformer network for interactive learning of challenging musculoskeletal images
Bi, Lei
Buehner, Ulrich
Fu, Xiaohang
Williamson, Tom
Choong, Peter
Kim, Jinman
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 243
[36] CNN-Transformer hybrid network for concrete dam crack patrol inspection
Li, Mingchao
Yuan, Jingyue
Ren, Qiubing
Luo, Qiling
Fu, Junen
Li, Zhitang
AUTOMATION IN CONSTRUCTION, 2024, 163
[37] SaltFormer: A hybrid CNN-Transformer network for automatic salt dome detection
Li, Yang
Peng, Suping
He, Dengke
COMPUTERS & GEOSCIENCES, 2025, 195
[38] UCTNet: Uncertainty-guided CNN-Transformer hybrid networks for medical image segmentation
Guo, Xiayu
Lin, Xian
Yang, Xin
Yu, Li
Cheng, Kwang-Ting
Yan, Zengqiang
PATTERN RECOGNITION, 2024, 152
[39] RAMIS: Increasing robustness and accuracy in medical image segmentation with hybrid CNN-transformer synergy
Gu, Jia
Tian, Fangzheng
Oh, Il-Seok
NEUROCOMPUTING, 2025, 618
[40] Polarformer: Optic Disc and Cup Segmentation Using a Hybrid CNN-Transformer and Polar Transformation
Feng, Yaowei
Li, Zhendong
Yang, Dong
Hu, Hongkai
Guo, Hui
Liu, Hao
APPLIED SCIENCES-BASEL, 2023, 13 (01):

← 1 2 3 4 5 →