InCoLoTransNet: An Involution-Convolution and Locality Attention-Aware Transformer for Precise Colorectal Polyp Segmentation in GI Images

被引：0

作者：

Oukdach, Yassine ^{[1
]}

Garbaz, Anass ^{[1
]}

Kerkaou, Zakaria ^{[1
]}

El Ansari, Mohamed ^{[2
]}

Koutti, Lahcen ^{[1
]}

El Ouafdi, Ahmed Fouad ^{[1
]}

Salihoun, Mouna ^{[3
]}

机构：

[1] Ibnou Zohr Univ, Fac Sci, Dept Comp Sci, LabSIV, Agadir 80000, Morocco

[2] Moulay Ismail Univ, Fac Sci, Dept Comp Sci, Informat & Applicat Lab, BP 11201, Meknes 52000, Morocco

[3] Mohammed V Univ Rabat, Fac Med, Pharm Rabat, Rabat 10000, Morocco

来源：

JOURNAL OF IMAGING INFORMATICS IN MEDICINE | 2025年

关键词：

Polyp segmentation; Vision transformer; CNN; Involution; Attention; GI images; CAPSULE ENDOSCOPY;

D O I：

10.1007/s10278-025-01389-7

中图分类号：

R8 [特种医学]; R445 [影像诊断学];

学科分类号：

1002 ; 100207 ; 1009 ;

摘要：

Gastrointestinal (GI) disease examination presents significant challenges to doctors due to the intricate structure of the human digestive system. Colonoscopy and wireless capsule endoscopy are the most commonly used tools for GI examination. However, the large amount of data generated by these technologies requires the expertise and intervention of doctors for disease identification, making manual analysis a very time-consuming task. Thus, the development of a computer-assisted system is highly desirable to assist clinical professionals in making decisions in a low-cost and effective way. In this paper, we introduce a novel framework called InCoLoTransNet, designed for polyp segmentation. The study is based on a transformer and convolution-involution neural network, following the encoder-decoder architecture. We employed the vision transformer in the encoder section to focus on the global context, while the decoder involves a convolution-involution collaboration for resampling the polyp features. Involution enhances the model's ability to adaptively capture spatial and contextual information, while convolution focuses on local information, leading to more accurate feature extraction. The essential features captured by the transformer encoder are passed to the decoder through two skip connection pathways. The CBAM module refines the features and passes them to the convolution block, leveraging attention mechanisms to emphasize relevant information. Meanwhile, locality self-attention is employed to pass essential features to the involution block, reinforcing the model's ability to capture more global features in the polyp regions. Experiments were conducted on five public datasets: CVC-ClinicDB, CVC-ColonDB, Kvasir-SEG, Etis-LaribPolypDB, and CVC-300. The results obtained by InCoLoTransNet are optimal when compared with 15 state-of-the-art methods for polyp segmentation, achieving the highest mean dice score of 93% on CVC-ColonDB and 90% on mean intersection over union, outperforming the state-of-the-art methods. Additionally, InCoLoTransNet distinguishes itself in terms of polyp segmentation generalization performance. It achieved high scores in mean dice coefficient and mean intersection over union on unseen datasets as follows: 85% and 79% on CVC-ColonDB, 91% and 87% on CVC-300, and 79% and 70% on Etis-LaribPolypDB, respectively.

引用

页数：21

共 1 条

[1] PDTANet: a context-guided and attention-aware deep learning method for tumor segmentation of guinea pig colorectal OCT images
Lyu, Jing
Ren, Lin
Liu, Qinying
Wang, Yan
Zhou, Zhenqiao
Chen, Yueyan
Jia, Hongbo
Tang, Yuguo
Li, Min
OPTICS CONTINUUM, 2023, 2 (07): : 1716 - 1734

← 1 →