Triple-Supervised Convolutional Transformer Aggregation for Robust Monocular Endoscopic Dense Depth Estimation

被引:1
|
作者
Fan, Wenkang [1 ]
Jiang, Wenjing [1 ]
Shi, Hong [2 ]
Zeng, Hui-Qing [3 ]
Chen, Yinran [1 ]
Luo, Xiongbiao [1 ]
机构
[1] Xiamen Univ, Natl Inst Data Sci Hlth & Med, Dept Comp Sci & Technol, Xiamen 361005, Peoples R China
[2] Fujian Med Univ, Canc Hosp, Fuzhou 350014, Peoples R China
[3] Xiamen Univ, Zhongshan Hosp, Xiamen 361004, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Estimation; Convolution; Convolutional codes; Lighting; Unsupervised learning; Monocular depth estimation; vision transformers; self-supervised learning; robotic-assisted endoscopy;
D O I
10.1109/TMRB.2024.3407384
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Accurate deeply learned dense depth prediction remains a challenge to monocular vision reconstruction. Compared to monocular depth estimation from natural images, endoscopic dense depth prediction is even more challenging. While it is difficult to annotate endoscopic video data for supervised learning, endoscopic video images certainly suffer from illumination variations (limited lighting source, limited field of viewing, and specular highlight), smooth and textureless surfaces in surgical complex fields. This work explores a new deep learning framework of triple-supervised convolutional transformer aggregation (TSCTA) for monocular endoscopic dense depth recovery without annotating any data. Specifically, TSCTA creates convolutional transformer aggregation networks with a new hybrid encoder that combines dense convolution and scalable transformers to parallel extract local texture features and global spatial-temporal features, while it builds a local and global aggregation decoder to effectively aggregate global features and local features from coarse to fine. Moreover, we develop a self-supervised learning framework with triple supervision, which integrates minimum photometric consistency and depth consistency with sparse depth self-supervision to train our model by unannotated data. We evaluated TSCTA on unannotated monocular endoscopic images collected from various surgical procedures, with the experimental results showing that our methods can achieve more accurate depth range, more complete depth distribution, more sufficient textures, better qualitative and quantitative assessment results than state-of-the-art deeply learned monocular dense depth estimation methods.
引用
收藏
页码:1017 / 1029
页数:13
相关论文
共 50 条
  • [31] Hierarchical Normalization for Robust Monocular Depth Estimation
    Zhang, Chi
    Yin, Wei
    Wang, Zhibin
    Yu, Gang
    Fu, Bin
    Shen, Chunhua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [32] ROBUST LEARNING FOR DEEP MONOCULAR DEPTH ESTIMATION
    Irie, Go
    Kawanishi, Takahito
    Kashino, Kunio
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 964 - 968
  • [33] Enhanced blur-robust monocular depth estimation via self-supervised learning
    Sung, Chi-Hun
    Kim, Seong-Yeol
    Shin, Ho-Ju
    Lee, Se-Ho
    Kim, Seung-Wook
    ELECTRONICS LETTERS, 2024, 60 (22)
  • [34] Multiple prior representation learning for self-supervised monocular depth estimation via hybrid transformer
    Sun, Guodong
    Liu, Junjie
    Liu, Mingxuan
    Liu, Moyun
    Zhang, Yang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
  • [35] Spike Transformer: Monocular Depth Estimation for Spiking Camera
    Zhang, Jiyuan
    Tang, Lulu
    Yu, Zhaofei
    Lu, Jiwen
    Huang, Tiejun
    COMPUTER VISION, ECCV 2022, PT VII, 2022, 13667 : 34 - 52
  • [36] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
    Zhang, Ning
    Nex, Francesco
    Vosselman, George
    Kerle, Norman
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18537 - 18546
  • [37] Dense Monocular Depth Estimation in Complex Dynamic Scenes
    Ranftl, Rene
    Vineetl, Vibhav
    Chen, Qifeng
    Koltun, Vladlen
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4058 - 4066
  • [38] A Self-Supervised Monocular Depth Estimation Method Based on High Resolution Convolutional Neural Network
    Pu, Zhengdong
    Chen, Shu
    Zou, Beiji
    Pu, Baoxing
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (01): : 118 - 127
  • [39] Visualization of Convolutional Neural Networks for Monocular Depth Estimation
    Hu, Junjie
    Zhang, Yan
    Okatani, Takayuki
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3868 - 3877
  • [40] LDA-Mono: A lightweight dual aggregation network for self-supervised monocular depth estimation
    Zhao, Bowen
    He, Hongdou
    Xu, Hang
    Shi, Peng
    Hao, Xiaobing
    Huang, Guoyan
    KNOWLEDGE-BASED SYSTEMS, 2024, 304