Triple-Supervised Convolutional Transformer Aggregation for Robust Monocular Endoscopic Dense Depth Estimation

被引:1
|
作者
Fan, Wenkang [1 ]
Jiang, Wenjing [1 ]
Shi, Hong [2 ]
Zeng, Hui-Qing [3 ]
Chen, Yinran [1 ]
Luo, Xiongbiao [1 ]
机构
[1] Xiamen Univ, Natl Inst Data Sci Hlth & Med, Dept Comp Sci & Technol, Xiamen 361005, Peoples R China
[2] Fujian Med Univ, Canc Hosp, Fuzhou 350014, Peoples R China
[3] Xiamen Univ, Zhongshan Hosp, Xiamen 361004, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Estimation; Convolution; Convolutional codes; Lighting; Unsupervised learning; Monocular depth estimation; vision transformers; self-supervised learning; robotic-assisted endoscopy;
D O I
10.1109/TMRB.2024.3407384
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Accurate deeply learned dense depth prediction remains a challenge to monocular vision reconstruction. Compared to monocular depth estimation from natural images, endoscopic dense depth prediction is even more challenging. While it is difficult to annotate endoscopic video data for supervised learning, endoscopic video images certainly suffer from illumination variations (limited lighting source, limited field of viewing, and specular highlight), smooth and textureless surfaces in surgical complex fields. This work explores a new deep learning framework of triple-supervised convolutional transformer aggregation (TSCTA) for monocular endoscopic dense depth recovery without annotating any data. Specifically, TSCTA creates convolutional transformer aggregation networks with a new hybrid encoder that combines dense convolution and scalable transformers to parallel extract local texture features and global spatial-temporal features, while it builds a local and global aggregation decoder to effectively aggregate global features and local features from coarse to fine. Moreover, we develop a self-supervised learning framework with triple supervision, which integrates minimum photometric consistency and depth consistency with sparse depth self-supervision to train our model by unannotated data. We evaluated TSCTA on unannotated monocular endoscopic images collected from various surgical procedures, with the experimental results showing that our methods can achieve more accurate depth range, more complete depth distribution, more sufficient textures, better qualitative and quantitative assessment results than state-of-the-art deeply learned monocular dense depth estimation methods.
引用
收藏
页码:1017 / 1029
页数:13
相关论文
共 50 条
  • [41] A Self-Supervised Network-Based Smoke Removal and Depth Estimation for Monocular Endoscopic Videos
    Zhang, Guo
    Gao, Xinbo
    Meng, Hongying
    Pang, Yu
    Nie, Xixi
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (09) : 6547 - 6559
  • [42] Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry
    Francani, Andre O.
    Maximo, Marcos R. O. A.
    2022 LATIN AMERICAN ROBOTICS SYMPOSIUM (LARS), 2022 BRAZILIAN SYMPOSIUM ON ROBOTICS (SBR), AND 2022 WORKSHOP ON ROBOTICS IN EDUCATION (WRE), 2022, : 312 - 317
  • [43] MDEConvFormer: estimating monocular depth as soft regression based on convolutional transformer
    Su, Wen
    He, Ye
    Zhang, Haifeng
    Yang, Wenzhen
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (26) : 68793 - 68811
  • [44] Long-term reprojection loss for self-supervised monocular depth estimation in endoscopic surgery
    Shi, Xiaowei
    Cui, Beilei
    Clarkson, Matthew J.
    Islam, Mobarakol
    ARTIFICIAL INTELLIGENCE SURGERY, 2024, 4 (03): : 247 - 257
  • [45] Channel Interaction and Transformer Depth Estimation Network: Robust Self-Supervised Depth Estimation Under Varied Weather Conditions
    Liu, Jianqiang
    Guo, Zhengyu
    Ping, Peng
    Zhang, Hao
    Shi, Quan
    SUSTAINABILITY, 2024, 16 (20)
  • [46] Simultaneous Monocular Endoscopic Dense Depth and Odometry Estimation Using Local-Global Integration Networks
    Fan, Wenkang
    Jiang, Wenjing
    Fang, Hao
    Shi, Hong
    Chen, Jianhua
    Luo, Xiongbiao
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VI, 2024, 15006 : 564 - 574
  • [47] SC-DepthV3: Robust Self-Supervised Monocular Depth Estimation for Dynamic Scenes
    Sun, Libo
    Bian, Jia-Wang
    Zhan, Huangying
    Yin, Wei
    Reid, Ian
    Shen, Chunhua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (01) : 497 - 508
  • [48] ROIFormer: Semantic-Aware Region of Interest Transformer for Efficient Self-Supervised Monocular Depth Estimation
    Xing, Daitao
    Shen, Jinglin
    Ho, Chiuman
    Tzes, Anthony
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 2983 - 2991
  • [49] Confidence-aware self-supervised learning for dense monocular depth estimation in dynamic laparoscopic scene
    Hirohata, Yasuhide
    Sogabe, Maina
    Miyazaki, Tetsuro
    Kawase, Toshihiro
    Kawashima, Kenji
    SCIENTIFIC REPORTS, 2023, 13 (01):
  • [50] Confidence-aware self-supervised learning for dense monocular depth estimation in dynamic laparoscopic scene
    Yasuhide Hirohata
    Maina Sogabe
    Tetsuro Miyazaki
    Toshihiro Kawase
    Kenji Kawashima
    Scientific Reports, 13 (1)