SVT-SDE: Spatiotemporal Vision Transformers-Based Self-Supervised Depth Estimation in Stereoscopic Surgical Videos

被引：6

作者：

Tao, Rong ^{[1
]}

Huang, Baoru ^{[2
]}

Zou, Xiaoyang ^{[1
]}

Zheng, Guoyan ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Biomed Engn, Inst Med Robot, Shanghai 200240, Peoples R China

[2] Imperial Coll London, Hamlyn Ctr Robot Surg, Dept Surg & Canc, London SW7 2AZ, England

来源：

IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS | 2023年 / 5卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Estimation; Image reconstruction; Videos; Surgery; Spatiotemporal phenomena; Feature extraction; Cameras; Depth estimation; surgical videos; spatiotemporal vision transformers; unsupervised; DEFORMATION RECOVERY; RECONSTRUCTION; NETWORKS; SURGERY;

D O I：

10.1109/TMRB.2023.3237867

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Dense depth estimation plays a crucial role in developing context-aware computer-assisted intervention systems. However, it is a challenging task due to low image quality and highly dynamic surgical environment. The task is further complicated by the difficulty in acquiring per-pixel ground truth depth data in a surgical setting. Recent works on self-supervised depth estimation use image reconstruction (i.e., the warped images) as supervisory signal, which helps to eliminate the requirement of ground truth depth annotations but also causes over-smoothed depth predictions. Additionally, most existing depth estimation methods are built upon static laparoscopic images, ignoring rich temporal information. To address these challenges, we propose a novel spatiotemporal vision transformers-based self-supervised depth estimation method, referred as SVT-SDE. Unlike previous works, SVT-SDE features a novel spatiotemporal vision transformers (SVT) architecture, which can learn complementary visual and temporal information from the input stereoscopic video clips. We further introduce high-frequency-based supervisory signal, which helps to preserve fine-grained details of depth estimation. Results from experiments conducted on two publicly available datasets demonstrate the superior performance of SVT-SDE over the state-of-the-art self-supervised depth estimation methods.

引用

页码：42 / 53

页数：12

共 50 条

[21] Embodiment: Self-Supervised Depth Estimation Based on Camera Models
Zhang, Jinchang
Reddy, Praveen Kumar
Wong, Xue-Iuan
Aloimonos, Yiannis
Lu, Guoyu
2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2024), 2024, : 7809 - 7816
[22] Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers
Pincic, Domagoj
Susanj, Diego
Lenac, Kristijan
SENSORS, 2022, 22 (19)
[23] Self-Supervised Monocular Depth Estimation Based on Channel Attention
Tao, Bo
Chen, Xinbo
Tong, Xiliang
Jiang, Du
Chen, Baojia
PHOTONICS, 2022, 9 (06)
[24] Video-Based Self-supervised Human Depth Estimation
Li, Qianlin
Zhang, Xiaoyan
ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 180 - 192
[25] Self-supervised pain intensity estimation from facial videos via statistical spatiotemporal distillation
Tavakolian, Mohammad
Lopez, Miguel Bordallo
Liu, Li
PATTERN RECOGNITION LETTERS, 2020, 140 : 26 - 33
[26] GlocalFuse-Depth: Fusing transformers and CNNs for all-day self-supervised monocular depth estimation
Zhang, Zezheng
Chan, Ryan K. Y.
Wong, Kenneth K. Y.
NEUROCOMPUTING, 2024, 569
[27] WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation on Surgical Videos with Unknown Camera Parameters
Lou, Ange
Noble, Jack
IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928
[28] TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning
Han, Daechan
Shin, Jeongmin
Kim, Namil
Hwang, Soonmin
Choi, Yukyung
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04): : 10969 - 10976
[29] Self-supervised Depth Estimation based on Feature Sharing and Consistency Constraints
Mendoza, Julio
Pedrini, Helio
PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 134 - 141
[30] Self-Supervised Learning of Monocular Depth Estimation Based on Progressive Strategy
Wang, Huachun
Sang, Xinzhu
Chen, Duo
Wang, Peng
Yan, Binbin
Qi, Shuai
Ye, Xiaoqian
Yao, Tong
IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2021, 7 : 375 - 383

← 1 2 3 4 5 →