Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy

被引：0

作者：

Solano, Pedro Esteban Chavarrias ^{[1
]}

Bulpitt, Andrew ^{[1
]}

Subramanian, Venkataraman ^{[2
,3
]}

Ali, Sharib ^{[1
]}

机构：

[1] Univ Leeds, Fac Engn & Phys Sci, Sch Comp Sci, Leeds LS2 9JT, England

[2] Leeds Teaching Hosp NHS Trust, Dept Gastroenterol, Leeds, England

[3] St Jamess Univ Leeds, Leeds Inst Med Res, Div Gastroenterol & Surg Sci, Leeds, England

来源：

MEDICAL IMAGE ANALYSIS | 2025年 / 99卷

基金：

英国工程与自然科学研究理事会;

关键词：

Deep learning; Monocular depth estimation; Surface normal prediction; Multi-task learning; Cross-task consistency; 3D colonoscopy; QUANTIFICATION; SURFACE; MOTION;

D O I：

10.1016/j.media.2024.103379

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Colonoscopy screening is the gold standard procedure for assessing abnormalities in the colon and rectum, such as ulcers and cancerous polyps. Measuring the abnormal mucosal area and its 3D reconstruction can help quantify the surveyed area and objectively evaluate disease burden. However, due to the complex topology of these organs and variable physical conditions, for example, lighting, large homogeneous texture, and image modality estimating distance from the camera (aka depth) is highly challenging. Moreover, most colonoscopic video acquisition is monocular, making the depth estimation a non-trivial problem. While methods in computer vision for depth estimation have been proposed and advanced on natural scene datasets, the efficacy of these techniques has not been widely quantified on colonoscopy datasets. As the colonic mucosa has several low- texture regions that are not well pronounced, learning representations from an auxiliary task can improve salient feature extraction, allowing estimation of accurate camera depths. In this work, we propose to develop a novel multi-task learning (MTL) approach with a shared encoder and two decoders, namely a surface normal decoder and a depth estimator decoder. Our depth estimator incorporates attention mechanisms to enhance global context awareness. We leverage the surface normal prediction to improve geometric feature extraction. Also, we apply a cross-task consistency loss among the two geometrically related tasks, surface normal and camera depth. We demonstrate an improvement of 15.75% on relative error and 10.7% improvement on delta(1.25) accuracy over the most accurate baseline state-of-the-art Big-to-Small (BTS) approach. All experiments are conducted on a recently released C3VD dataset, and thus, we provide a first benchmark of state-of-the-art methods on this dataset.

引用

页数：16

共 50 条

[31] MULTI-TASK LEARNING FOR FACE IDENTIFICATION AND ATTRIBUTE ESTIMATION
Hsieh, Hui-Lan
Hsu, Winston
Chen, Yan-Ying
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2981 - 2985
[32] Multi-task Representation Learning for Travel Time Estimation
Li, Yaguang
Fu, Kun
Wang, Zheng
Shahabi, Cyrus
Ye, Jieping
Liu, Yan
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1695 - 1704
[33] Multi-Task Rank Learning for Visual Saliency Estimation
Li, Jia
Tian, Yonghong
Huang, Tiejun
Gao, Wen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2011, 21 (05) : 623 - 636
[34] Cross-Task Crowdsourcing
Mo, Kaixiang
Zhong, Erheng
Yang, Qiang
19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 677 - 685
[35] Multi-task Sparse Gaussian Processes with Improved Multi-task Sparsity Regularization
Zhu, Jiang
Sun, Shiliang
PATTERN RECOGNITION (CCPR 2014), PT I, 2014, 483 : 54 - 62
[36] Cross-stitch Networks for Multi-task Learning
Misra, Ishan
Shrivastava, Abhinav
Gupta, Abhinav
Hebert, Martial
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3994 - 4003
[37] Multi-task Supervised Learning via Cross-learning
Cervino, Juan
Andres Bazerque, Juan
Calvo-Fullana, Miguel
Ribeiro, Alejandro
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1381 - 1385
[38] Semi-supervised Multi-task Learning for Semantics and Depth
Wang, Yufeng
Tsai, Yi-Hsuan
Hung, Wei-Chih
Ding, Wenrui
Liu, Shuo
Yang, Ming-Hsuan
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2663 - 2672
[39] SEQUENTIAL CROSS ATTENTION BASED MULTI-TASK LEARNING
Kim, Sunkyung
Choi, Hyesong
Min, Dongbo
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2311 - 2315
[40] MULTI-TASK LEARNING WITH CROSS ATTENTION FOR KEYWORD SPOTTING
Higuchil, Takuya
Gupta, Anmol
Dhir, Chandra
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 571 - 578

← 1 2 3 4 5 →