Triple-Supervised Convolutional Transformer Aggregation for Robust Monocular Endoscopic Dense Depth Estimation

被引：1

作者：

Fan, Wenkang ^{[1
]}

Jiang, Wenjing ^{[1
]}

Shi, Hong ^{[2
]}

Zeng, Hui-Qing ^{[3
]}

Chen, Yinran ^{[1
]}

Luo, Xiongbiao ^{[1
]}

机构：

[1] Xiamen Univ, Natl Inst Data Sci Hlth & Med, Dept Comp Sci & Technol, Xiamen 361005, Peoples R China

[2] Fujian Med Univ, Canc Hosp, Fuzhou 350014, Peoples R China

[3] Xiamen Univ, Zhongshan Hosp, Xiamen 361004, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS | 2024年 / 6卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Transformers; Estimation; Convolution; Convolutional codes; Lighting; Unsupervised learning; Monocular depth estimation; vision transformers; self-supervised learning; robotic-assisted endoscopy;

D O I：

10.1109/TMRB.2024.3407384

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Accurate deeply learned dense depth prediction remains a challenge to monocular vision reconstruction. Compared to monocular depth estimation from natural images, endoscopic dense depth prediction is even more challenging. While it is difficult to annotate endoscopic video data for supervised learning, endoscopic video images certainly suffer from illumination variations (limited lighting source, limited field of viewing, and specular highlight), smooth and textureless surfaces in surgical complex fields. This work explores a new deep learning framework of triple-supervised convolutional transformer aggregation (TSCTA) for monocular endoscopic dense depth recovery without annotating any data. Specifically, TSCTA creates convolutional transformer aggregation networks with a new hybrid encoder that combines dense convolution and scalable transformers to parallel extract local texture features and global spatial-temporal features, while it builds a local and global aggregation decoder to effectively aggregate global features and local features from coarse to fine. Moreover, we develop a self-supervised learning framework with triple supervision, which integrates minimum photometric consistency and depth consistency with sparse depth self-supervision to train our model by unannotated data. We evaluated TSCTA on unannotated monocular endoscopic images collected from various surgical procedures, with the experimental results showing that our methods can achieve more accurate depth range, more complete depth distribution, more sufficient textures, better qualitative and quantitative assessment results than state-of-the-art deeply learned monocular dense depth estimation methods.

引用

页码：1017 / 1029

页数：13

共 50 条

[41] A Self-Supervised Network-Based Smoke Removal and Depth Estimation for Monocular Endoscopic Videos
Zhang, Guo
Gao, Xinbo
Meng, Hongying
Pang, Yu
Nie, Xixi
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (09) : 6547 - 6559
[42] Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry
Francani, Andre O.
Maximo, Marcos R. O. A.
2022 LATIN AMERICAN ROBOTICS SYMPOSIUM (LARS), 2022 BRAZILIAN SYMPOSIUM ON ROBOTICS (SBR), AND 2022 WORKSHOP ON ROBOTICS IN EDUCATION (WRE), 2022, : 312 - 317
[43] MDEConvFormer: estimating monocular depth as soft regression based on convolutional transformer
Su, Wen
He, Ye
Zhang, Haifeng
Yang, Wenzhen
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (26) : 68793 - 68811
[44] Long-term reprojection loss for self-supervised monocular depth estimation in endoscopic surgery
Shi, Xiaowei
Cui, Beilei
Clarkson, Matthew J.
Islam, Mobarakol
ARTIFICIAL INTELLIGENCE SURGERY, 2024, 4 (03): : 247 - 257
[45] Channel Interaction and Transformer Depth Estimation Network: Robust Self-Supervised Depth Estimation Under Varied Weather Conditions
Liu, Jianqiang
Guo, Zhengyu
Ping, Peng
Zhang, Hao
Shi, Quan
SUSTAINABILITY, 2024, 16 (20)
[46] Simultaneous Monocular Endoscopic Dense Depth and Odometry Estimation Using Local-Global Integration Networks
Fan, Wenkang
Jiang, Wenjing
Fang, Hao
Shi, Hong
Chen, Jianhua
Luo, Xiongbiao
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VI, 2024, 15006 : 564 - 574
[47] SC-DepthV3: Robust Self-Supervised Monocular Depth Estimation for Dynamic Scenes
Sun, Libo
Bian, Jia-Wang
Zhan, Huangying
Yin, Wei
Reid, Ian
Shen, Chunhua
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (01) : 497 - 508
[48] ROIFormer: Semantic-Aware Region of Interest Transformer for Efficient Self-Supervised Monocular Depth Estimation
Xing, Daitao
Shen, Jinglin
Ho, Chiuman
Tzes, Anthony
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 2983 - 2991
[49] Confidence-aware self-supervised learning for dense monocular depth estimation in dynamic laparoscopic scene
Hirohata, Yasuhide
Sogabe, Maina
Miyazaki, Tetsuro
Kawase, Toshihiro
Kawashima, Kenji
SCIENTIFIC REPORTS, 2023, 13 (01):
[50] Confidence-aware self-supervised learning for dense monocular depth estimation in dynamic laparoscopic scene
Yasuhide Hirohata
Maina Sogabe
Tetsuro Miyazaki
Toshihiro Kawase
Kenji Kawashima
Scientific Reports, 13 (1)

← 1 2 3 4 5 →