SVT-SDE: Spatiotemporal Vision Transformers-Based Self-Supervised Depth Estimation in Stereoscopic Surgical Videos

被引：6

作者：

Tao, Rong ^{[1
]}

Huang, Baoru ^{[2
]}

Zou, Xiaoyang ^{[1
]}

Zheng, Guoyan ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Biomed Engn, Inst Med Robot, Shanghai 200240, Peoples R China

[2] Imperial Coll London, Hamlyn Ctr Robot Surg, Dept Surg & Canc, London SW7 2AZ, England

来源：

IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS | 2023年 / 5卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Estimation; Image reconstruction; Videos; Surgery; Spatiotemporal phenomena; Feature extraction; Cameras; Depth estimation; surgical videos; spatiotemporal vision transformers; unsupervised; DEFORMATION RECOVERY; RECONSTRUCTION; NETWORKS; SURGERY;

D O I：

10.1109/TMRB.2023.3237867

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Dense depth estimation plays a crucial role in developing context-aware computer-assisted intervention systems. However, it is a challenging task due to low image quality and highly dynamic surgical environment. The task is further complicated by the difficulty in acquiring per-pixel ground truth depth data in a surgical setting. Recent works on self-supervised depth estimation use image reconstruction (i.e., the warped images) as supervisory signal, which helps to eliminate the requirement of ground truth depth annotations but also causes over-smoothed depth predictions. Additionally, most existing depth estimation methods are built upon static laparoscopic images, ignoring rich temporal information. To address these challenges, we propose a novel spatiotemporal vision transformers-based self-supervised depth estimation method, referred as SVT-SDE. Unlike previous works, SVT-SDE features a novel spatiotemporal vision transformers (SVT) architecture, which can learn complementary visual and temporal information from the input stereoscopic video clips. We further introduce high-frequency-based supervisory signal, which helps to preserve fine-grained details of depth estimation. Results from experiments conducted on two publicly available datasets demonstrate the superior performance of SVT-SDE over the state-of-the-art self-supervised depth estimation methods.

引用

页码：42 / 53

页数：12

共 50 条

[31] Depth estimation algorithm of monocular image based on self-supervised learning
Bai L.
Liu L.-J.
Li X.-A.
Wu S.
Liu R.-Q.
Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2023, 53 (04): : 1139 - 1145
[32] TinyDepth: Lightweight self-supervised monocular depth estimation based on transformer
Cheng, Zeyu
Zhang, Yi
Yu, Yang
Song, Zhe
Tang, Chengkai
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
[33] Self-Supervised Monocular Depth Estimation With Isometric-Self-Sample-Based Learning
Cha, Geonho
Jang, Ho-Deok
Wee, Dongyoon
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (04) : 2173 - 2180
[34] GCNDepth: Self-supervised monocular depth estimation based on graph convolutional network
Masoumian, Armin
Rashwan, Hatem A.
Abdulwahab, Saddam
Cristiano, Julian
Asif, M. Salman
Puig, Domenec
NEUROCOMPUTING, 2023, 517 : 81 - 92
[35] Self-Supervised Monocular Depth Estimation Based on Full Scale Feature Fusion
Wang C.
Chen Y.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (05): : 667 - 675
[36] Indoor self-supervised monocular depth estimation based on level feature fusion
Cheng D.
Zhang H.
Kou Q.
Lü C.
Qian J.
Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2023, 31 (20): : 2993 - 3009
[37] Depth Estimation of Monocular PCB Image Based on Self-Supervised Convolution Network
Huang, Zedong
Gu, Jinan
Li, Jing
Li, Shuwei
Hu, Junjie
ELECTRONICS, 2022, 11 (12)
[38] Self-supervised Monocular Depth Estimation Method Based on Piecewise Plane Model
Zhang, Weiwei
Zhang, Guanwen
Zhou, Wei
2024 IEEE 19TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, ICIEA 2024, 2024,
[39] Self-supervised monocular depth estimation based on image texture detail enhancement
Yuanzhen Li
Fei Luo
Wenjie Li
Shenjie Zheng
Huan-huan Wu
Chunxia Xiao
The Visual Computer, 2021, 37 : 2567 - 2580
[40] Self-supervised monocular depth estimation based on combining convolution and multilayer perceptron
Zheng, Qiumei
Yu, Tao
Wang, Fenghua
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117

← 1 2 3 4 5 →