SVT-SDE: Spatiotemporal Vision Transformers-Based Self-Supervised Depth Estimation in Stereoscopic Surgical Videos

被引：6

作者：

Tao, Rong ^{[1
]}

Huang, Baoru ^{[2
]}

Zou, Xiaoyang ^{[1
]}

Zheng, Guoyan ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Biomed Engn, Inst Med Robot, Shanghai 200240, Peoples R China

[2] Imperial Coll London, Hamlyn Ctr Robot Surg, Dept Surg & Canc, London SW7 2AZ, England

来源：

IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS | 2023年 / 5卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Estimation; Image reconstruction; Videos; Surgery; Spatiotemporal phenomena; Feature extraction; Cameras; Depth estimation; surgical videos; spatiotemporal vision transformers; unsupervised; DEFORMATION RECOVERY; RECONSTRUCTION; NETWORKS; SURGERY;

D O I：

10.1109/TMRB.2023.3237867

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Dense depth estimation plays a crucial role in developing context-aware computer-assisted intervention systems. However, it is a challenging task due to low image quality and highly dynamic surgical environment. The task is further complicated by the difficulty in acquiring per-pixel ground truth depth data in a surgical setting. Recent works on self-supervised depth estimation use image reconstruction (i.e., the warped images) as supervisory signal, which helps to eliminate the requirement of ground truth depth annotations but also causes over-smoothed depth predictions. Additionally, most existing depth estimation methods are built upon static laparoscopic images, ignoring rich temporal information. To address these challenges, we propose a novel spatiotemporal vision transformers-based self-supervised depth estimation method, referred as SVT-SDE. Unlike previous works, SVT-SDE features a novel spatiotemporal vision transformers (SVT) architecture, which can learn complementary visual and temporal information from the input stereoscopic video clips. We further introduce high-frequency-based supervisory signal, which helps to preserve fine-grained details of depth estimation. Results from experiments conducted on two publicly available datasets demonstrate the superior performance of SVT-SDE over the state-of-the-art self-supervised depth estimation methods.

引用

页码：42 / 53

页数：12

共 50 条

[41] Self-supervised monocular depth estimation based on image texture detail enhancement
Li, Yuanzhen
Luo, Fei
Li, Wenjie
Zheng, Shenjie
Wu, Huan-huan
Xiao, Chunxia
VISUAL COMPUTER, 2021, 37 (9-11): : 2567 - 2580
[42] Self-supervised Monocular Depth Estimation Based on Semantic Assistance and Depth Temporal Consistency Constraints
Ling, Chuanwu
Chen, Hua
Xu, Dayong
Zhang, Xiaogang
Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2024, 51 (08): : 1 - 12
[43] Self-Supervised Monocular Depth Estimation With Frequency-Based Recurrent Refinement
Li, Rui
Xue, Danna
Zhu, Yu
Wu, Hao
Sun, Jinqiu
Zhang, Yanning
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5626 - 5637
[44] Parameter search-based scaling network for self-supervised depth estimation
Xiao, Yuhan
Sun, Shang
Liao, TaoLin
THIRD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION; NETWORK AND COMPUTER TECHNOLOGY (ECNCT 2021), 2022, 12167
[45] Self-supervised monocular depth estimation in dynamic scenes based on deep learning
Cheng, Binbin
Yu, Ying
Zhang, Lei
Wang, Ziquan
Jiang, Zhipeng
National Remote Sensing Bulletin, 2024, 28 (09) : 2170 - 2186
[46] HQDec: Self-Supervised Monocular Depth Estimation Based on a High-Quality Decoder
Wang, Fei
Cheng, Jun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2453 - 2468
[47] ATTENTION-BASED SELF-SUPERVISED LEARNING MONOCULAR DEPTH ESTIMATION WITH EDGE REFINEMENT
Jiang, Chenweinan
Liu, Haichun
Li, Lanzhen
Pan, Changchun
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 3218 - 3222
[48] Discriminative-Guided Diffusion-Based Self-supervised Monocular Depth Estimation
Liu, Runze
Zhang, Guanghui
Zhu, Dongchen
Wang, Lei
Zhang, Xiaolin
Li, Jiamao
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VI, 2025, 15036 : 328 - 342
[49] Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions
Wang, Xiuling
Yu, Minglin
Wang, Haixia
Lu, Xiao
Zhang, Zhiguo
IEEE SENSORS JOURNAL, 2024, 24 (04) : 4978 - 4991
[50] LAM-Depth: Laplace-Attention Module-Based Self-Supervised Monocular Depth Estimation
Wei, Jiansheng
Pan, Shuguo
Gao, Wang
Guo, Peng
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (10) : 13706 - 13716

← 1 2 3 4 5 →