Triple-Supervised Convolutional Transformer Aggregation for Robust Monocular Endoscopic Dense Depth Estimation

被引：1

作者：

Fan, Wenkang ^{[1
]}

Jiang, Wenjing ^{[1
]}

Shi, Hong ^{[2
]}

Zeng, Hui-Qing ^{[3
]}

Chen, Yinran ^{[1
]}

Luo, Xiongbiao ^{[1
]}

机构：

[1] Xiamen Univ, Natl Inst Data Sci Hlth & Med, Dept Comp Sci & Technol, Xiamen 361005, Peoples R China

[2] Fujian Med Univ, Canc Hosp, Fuzhou 350014, Peoples R China

[3] Xiamen Univ, Zhongshan Hosp, Xiamen 361004, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS | 2024年 / 6卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Transformers; Estimation; Convolution; Convolutional codes; Lighting; Unsupervised learning; Monocular depth estimation; vision transformers; self-supervised learning; robotic-assisted endoscopy;

D O I：

10.1109/TMRB.2024.3407384

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Accurate deeply learned dense depth prediction remains a challenge to monocular vision reconstruction. Compared to monocular depth estimation from natural images, endoscopic dense depth prediction is even more challenging. While it is difficult to annotate endoscopic video data for supervised learning, endoscopic video images certainly suffer from illumination variations (limited lighting source, limited field of viewing, and specular highlight), smooth and textureless surfaces in surgical complex fields. This work explores a new deep learning framework of triple-supervised convolutional transformer aggregation (TSCTA) for monocular endoscopic dense depth recovery without annotating any data. Specifically, TSCTA creates convolutional transformer aggregation networks with a new hybrid encoder that combines dense convolution and scalable transformers to parallel extract local texture features and global spatial-temporal features, while it builds a local and global aggregation decoder to effectively aggregate global features and local features from coarse to fine. Moreover, we develop a self-supervised learning framework with triple supervision, which integrates minimum photometric consistency and depth consistency with sparse depth self-supervision to train our model by unannotated data. We evaluated TSCTA on unannotated monocular endoscopic images collected from various surgical procedures, with the experimental results showing that our methods can achieve more accurate depth range, more complete depth distribution, more sufficient textures, better qualitative and quantitative assessment results than state-of-the-art deeply learned monocular dense depth estimation methods.

引用

页码：1017 / 1029

页数：13

共 50 条

[31] Hierarchical Normalization for Robust Monocular Depth Estimation
Zhang, Chi
Yin, Wei
Wang, Zhibin
Yu, Gang
Fu, Bin
Shen, Chunhua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[32] ROBUST LEARNING FOR DEEP MONOCULAR DEPTH ESTIMATION
Irie, Go
Kawanishi, Takahito
Kashino, Kunio
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 964 - 968
[33] Enhanced blur-robust monocular depth estimation via self-supervised learning
Sung, Chi-Hun
Kim, Seong-Yeol
Shin, Ho-Ju
Lee, Se-Ho
Kim, Seung-Wook
ELECTRONICS LETTERS, 2024, 60 (22)
[34] Multiple prior representation learning for self-supervised monocular depth estimation via hybrid transformer
Sun, Guodong
Liu, Junjie
Liu, Mingxuan
Liu, Moyun
Zhang, Yang
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
[35] Spike Transformer: Monocular Depth Estimation for Spiking Camera
Zhang, Jiyuan
Tang, Lulu
Yu, Zhaofei
Lu, Jiwen
Huang, Tiejun
COMPUTER VISION, ECCV 2022, PT VII, 2022, 13667 : 34 - 52
[36] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
Zhang, Ning
Nex, Francesco
Vosselman, George
Kerle, Norman
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18537 - 18546
[37] Dense Monocular Depth Estimation in Complex Dynamic Scenes
Ranftl, Rene
Vineetl, Vibhav
Chen, Qifeng
Koltun, Vladlen
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4058 - 4066
[38] A Self-Supervised Monocular Depth Estimation Method Based on High Resolution Convolutional Neural Network
Pu, Zhengdong
Chen, Shu
Zou, Beiji
Pu, Baoxing
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (01): : 118 - 127
[39] Visualization of Convolutional Neural Networks for Monocular Depth Estimation
Hu, Junjie
Zhang, Yan
Okatani, Takayuki
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3868 - 3877
[40] LDA-Mono: A lightweight dual aggregation network for self-supervised monocular depth estimation
Zhao, Bowen
He, Hongdou
Xu, Hang
Shi, Peng
Hao, Xiaobing
Huang, Guoyan
KNOWLEDGE-BASED SYSTEMS, 2024, 304

← 1 2 3 4 5 →