Triple-Supervised Convolutional Transformer Aggregation for Robust Monocular Endoscopic Dense Depth Estimation

被引:1
|
作者
Fan, Wenkang [1 ]
Jiang, Wenjing [1 ]
Shi, Hong [2 ]
Zeng, Hui-Qing [3 ]
Chen, Yinran [1 ]
Luo, Xiongbiao [1 ]
机构
[1] Xiamen Univ, Natl Inst Data Sci Hlth & Med, Dept Comp Sci & Technol, Xiamen 361005, Peoples R China
[2] Fujian Med Univ, Canc Hosp, Fuzhou 350014, Peoples R China
[3] Xiamen Univ, Zhongshan Hosp, Xiamen 361004, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Estimation; Convolution; Convolutional codes; Lighting; Unsupervised learning; Monocular depth estimation; vision transformers; self-supervised learning; robotic-assisted endoscopy;
D O I
10.1109/TMRB.2024.3407384
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Accurate deeply learned dense depth prediction remains a challenge to monocular vision reconstruction. Compared to monocular depth estimation from natural images, endoscopic dense depth prediction is even more challenging. While it is difficult to annotate endoscopic video data for supervised learning, endoscopic video images certainly suffer from illumination variations (limited lighting source, limited field of viewing, and specular highlight), smooth and textureless surfaces in surgical complex fields. This work explores a new deep learning framework of triple-supervised convolutional transformer aggregation (TSCTA) for monocular endoscopic dense depth recovery without annotating any data. Specifically, TSCTA creates convolutional transformer aggregation networks with a new hybrid encoder that combines dense convolution and scalable transformers to parallel extract local texture features and global spatial-temporal features, while it builds a local and global aggregation decoder to effectively aggregate global features and local features from coarse to fine. Moreover, we develop a self-supervised learning framework with triple supervision, which integrates minimum photometric consistency and depth consistency with sparse depth self-supervision to train our model by unannotated data. We evaluated TSCTA on unannotated monocular endoscopic images collected from various surgical procedures, with the experimental results showing that our methods can achieve more accurate depth range, more complete depth distribution, more sufficient textures, better qualitative and quantitative assessment results than state-of-the-art deeply learned monocular dense depth estimation methods.
引用
收藏
页码:1017 / 1029
页数:13
相关论文
共 50 条
  • [1] CATNet: Convolutional attention and transformer for monocular depth estimation
    Tang, Shuai
    Lu, Tongwei
    Liu, Xuanxuan
    Zhou, Huabing
    Zhang, Yanduo
    PATTERN RECOGNITION, 2024, 145
  • [2] Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy
    Liu, Xingtong
    Sinha, Ayushi
    Unberath, Mathias
    Ishii, Masaru
    Hager, Gregory D.
    Taylor, Russell H.
    Reiter, Austin
    OR 2.0 CONTEXT-AWARE OPERATING THEATERS, COMPUTER ASSISTED ROBOTIC ENDOSCOPY, CLINICAL IMAGE-BASED PROCEDURES, AND SKIN IMAGE ANALYSIS, OR 2.0 2018, 2018, 11041 : 128 - 138
  • [3] MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer
    Zhao, Chaoqiang
    Zhang, Youmin
    Poggi, Matteo
    Tosi, Fabio
    Guo, Xianda
    Zhu, Zheng
    Huang, Guan
    Tang, Yang
    Mattoccia, Stefano
    2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 668 - 678
  • [4] Self-supervised Cascade Training for Monocular Endoscopic Dense Depth Recovery
    Jiang, Wenjing
    Fan, Wenkang
    Chen, Jianhua
    Shi, Hong
    Luo, Xiongbiao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V, 2024, 14429 : 480 - 491
  • [5] Image Masking for Robust Self-Supervised Monocular Depth Estimation
    Chawla, Hemang
    Jeeveswaran, Kishaan
    Arani, Elahe
    Zonooz, Bahram
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 10054 - 10060
  • [6] Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances
    Guizilini, Vitor
    Li, Jie
    Ambrus, Rares
    Pillai, Sudeep
    Gaidon, Adrien
    CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [7] TAMDepth: self-supervised monocular depth estimation with transformer and adapter modulation
    Li, Shaokang
    Lyu, Chengzhi
    Xia, Bin
    Chen, Ziheng
    Zhang, Lei
    VISUAL COMPUTER, 2024, 40 (10): : 6797 - 6808
  • [8] Self-Supervised Monocular Depth Estimation Using Hybrid Transformer Encoder
    Hwang, Seung-Jun
    Park, Sung-Jun
    Baek, Joong-Hwan
    Kim, Byungkyu
    IEEE SENSORS JOURNAL, 2022, 22 (19) : 18762 - 18770
  • [9] TinyDepth: Lightweight self-supervised monocular depth estimation based on transformer
    Cheng, Zeyu
    Zhang, Yi
    Yu, Yang
    Song, Zhe
    Tang, Chengkai
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
  • [10] Dense Depth Estimation in Monocular Endoscopy With Self-Supervised Learning Methods
    Liu, Xingtong
    Sinha, Ayushi
    Ishii, Masaru
    Hager, Gregory D.
    Reiter, Austin
    Taylor, Russell H.
    Unberath, Mathias
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (05) : 1438 - 1447