Triple-Supervised Convolutional Transformer Aggregation for Robust Monocular Endoscopic Dense Depth Estimation

被引:1
|
作者
Fan, Wenkang [1 ]
Jiang, Wenjing [1 ]
Shi, Hong [2 ]
Zeng, Hui-Qing [3 ]
Chen, Yinran [1 ]
Luo, Xiongbiao [1 ]
机构
[1] Xiamen Univ, Natl Inst Data Sci Hlth & Med, Dept Comp Sci & Technol, Xiamen 361005, Peoples R China
[2] Fujian Med Univ, Canc Hosp, Fuzhou 350014, Peoples R China
[3] Xiamen Univ, Zhongshan Hosp, Xiamen 361004, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Estimation; Convolution; Convolutional codes; Lighting; Unsupervised learning; Monocular depth estimation; vision transformers; self-supervised learning; robotic-assisted endoscopy;
D O I
10.1109/TMRB.2024.3407384
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Accurate deeply learned dense depth prediction remains a challenge to monocular vision reconstruction. Compared to monocular depth estimation from natural images, endoscopic dense depth prediction is even more challenging. While it is difficult to annotate endoscopic video data for supervised learning, endoscopic video images certainly suffer from illumination variations (limited lighting source, limited field of viewing, and specular highlight), smooth and textureless surfaces in surgical complex fields. This work explores a new deep learning framework of triple-supervised convolutional transformer aggregation (TSCTA) for monocular endoscopic dense depth recovery without annotating any data. Specifically, TSCTA creates convolutional transformer aggregation networks with a new hybrid encoder that combines dense convolution and scalable transformers to parallel extract local texture features and global spatial-temporal features, while it builds a local and global aggregation decoder to effectively aggregate global features and local features from coarse to fine. Moreover, we develop a self-supervised learning framework with triple supervision, which integrates minimum photometric consistency and depth consistency with sparse depth self-supervision to train our model by unannotated data. We evaluated TSCTA on unannotated monocular endoscopic images collected from various surgical procedures, with the experimental results showing that our methods can achieve more accurate depth range, more complete depth distribution, more sufficient textures, better qualitative and quantitative assessment results than state-of-the-art deeply learned monocular dense depth estimation methods.
引用
收藏
页码:1017 / 1029
页数:13
相关论文
共 50 条
  • [21] On Robust Cross-view Consistency in Self-supervised Monocular Depth Estimation
    Haimei Zhao
    Jing Zhang
    Zhuo Chen
    Bo Yuan
    Dacheng Tao
    Machine Intelligence Research, 2024, 21 : 495 - 513
  • [22] Enhancing Self-supervised Monocular Depth Estimation via Incorporating Robust Constraints
    Li, Rui
    He, Xiantuo
    Zhu, Yu
    Li, Xianjun
    Sun, Jinqiu
    Zhang, Yanning
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3108 - 3117
  • [23] Self-supervised monocular depth estimation with self-distillation and dense skip connection
    Xiang, Xuezhi
    Li, Wei
    Wang, Yao
    El Saddik, Abdulmotaleb
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 246
  • [24] DTTNet: Depth Transverse Transformer Network for Monocular Depth Estimation
    Kamath, Shreyas K. M.
    Rajeev, Srijith
    Panetta, Karen
    Agaian, Sos S.
    MULTIMODAL IMAGE EXPLOITATION AND LEARNING 2022, 2022, 12100
  • [25] Integrating convolutional guidance and Transformer fusion with Markov Random Fields smoothing for monocular depth estimation
    Peng, Xiaorui
    Meng, Yu
    Shi, Boqiang
    Zheng, Chao
    Wang, Meijun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 143
  • [26] Digging Into Self-Supervised Monocular Depth Estimation
    Godard, Clement
    Mac Aodha, Oisin
    Firman, Michael
    Brostow, Gabriel
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3827 - 3837
  • [27] Self-supervised monocular depth estimation in fog
    Tao, Bo
    Hu, Jiaxin
    Jiang, Du
    Li, Gongfa
    Chen, Baojia
    Qian, Xinbo
    OPTICAL ENGINEERING, 2023, 62 (03)
  • [28] On the uncertainty of self-supervised monocular depth estimation
    Poggi, Matteo
    Aleotti, Filippo
    Tosi, Fabio
    Mattoccia, Stefano
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3224 - 3234
  • [29] Revisiting Self-supervised Monocular Depth Estimation
    Kim, Ue-Hwan
    Lee, Gyeong-Min
    Kim, Jong-Hwan
    ROBOT INTELLIGENCE TECHNOLOGY AND APPLICATIONS 6, 2022, 429 : 336 - 350
  • [30] Semi-Supervised Adversarial Monocular Depth Estimation
    Ji, Rongrong
    Li, Ke
    Wang, Yan
    Sun, Xiaoshuai
    Guo, Feng
    Guo, Xiaowei
    Wu, Yongjian
    Huang, Feiyue
    Luo, Jiebo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) : 2410 - 2422