SVT-SDE: Spatiotemporal Vision Transformers-Based Self-Supervised Depth Estimation in Stereoscopic Surgical Videos

被引:6
|
作者
Tao, Rong [1 ]
Huang, Baoru [2 ]
Zou, Xiaoyang [1 ]
Zheng, Guoyan [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Biomed Engn, Inst Med Robot, Shanghai 200240, Peoples R China
[2] Imperial Coll London, Hamlyn Ctr Robot Surg, Dept Surg & Canc, London SW7 2AZ, England
来源
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS | 2023年 / 5卷 / 01期
基金
中国国家自然科学基金;
关键词
Estimation; Image reconstruction; Videos; Surgery; Spatiotemporal phenomena; Feature extraction; Cameras; Depth estimation; surgical videos; spatiotemporal vision transformers; unsupervised; DEFORMATION RECOVERY; RECONSTRUCTION; NETWORKS; SURGERY;
D O I
10.1109/TMRB.2023.3237867
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Dense depth estimation plays a crucial role in developing context-aware computer-assisted intervention systems. However, it is a challenging task due to low image quality and highly dynamic surgical environment. The task is further complicated by the difficulty in acquiring per-pixel ground truth depth data in a surgical setting. Recent works on self-supervised depth estimation use image reconstruction (i.e., the warped images) as supervisory signal, which helps to eliminate the requirement of ground truth depth annotations but also causes over-smoothed depth predictions. Additionally, most existing depth estimation methods are built upon static laparoscopic images, ignoring rich temporal information. To address these challenges, we propose a novel spatiotemporal vision transformers-based self-supervised depth estimation method, referred as SVT-SDE. Unlike previous works, SVT-SDE features a novel spatiotemporal vision transformers (SVT) architecture, which can learn complementary visual and temporal information from the input stereoscopic video clips. We further introduce high-frequency-based supervisory signal, which helps to preserve fine-grained details of depth estimation. Results from experiments conducted on two publicly available datasets demonstrate the superior performance of SVT-SDE over the state-of-the-art self-supervised depth estimation methods.
引用
收藏
页码:42 / 53
页数:12
相关论文
共 50 条
  • [21] Embodiment: Self-Supervised Depth Estimation Based on Camera Models
    Zhang, Jinchang
    Reddy, Praveen Kumar
    Wong, Xue-Iuan
    Aloimonos, Yiannis
    Lu, Guoyu
    2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2024), 2024, : 7809 - 7816
  • [22] Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers
    Pincic, Domagoj
    Susanj, Diego
    Lenac, Kristijan
    SENSORS, 2022, 22 (19)
  • [23] Self-Supervised Monocular Depth Estimation Based on Channel Attention
    Tao, Bo
    Chen, Xinbo
    Tong, Xiliang
    Jiang, Du
    Chen, Baojia
    PHOTONICS, 2022, 9 (06)
  • [24] Video-Based Self-supervised Human Depth Estimation
    Li, Qianlin
    Zhang, Xiaoyan
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 180 - 192
  • [25] Self-supervised pain intensity estimation from facial videos via statistical spatiotemporal distillation
    Tavakolian, Mohammad
    Lopez, Miguel Bordallo
    Liu, Li
    PATTERN RECOGNITION LETTERS, 2020, 140 : 26 - 33
  • [26] GlocalFuse-Depth: Fusing transformers and CNNs for all-day self-supervised monocular depth estimation
    Zhang, Zezheng
    Chan, Ryan K. Y.
    Wong, Kenneth K. Y.
    NEUROCOMPUTING, 2024, 569
  • [27] WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation on Surgical Videos with Unknown Camera Parameters
    Lou, Ange
    Noble, Jack
    IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928
  • [28] TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning
    Han, Daechan
    Shin, Jeongmin
    Kim, Namil
    Hwang, Soonmin
    Choi, Yukyung
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04): : 10969 - 10976
  • [29] Self-supervised Depth Estimation based on Feature Sharing and Consistency Constraints
    Mendoza, Julio
    Pedrini, Helio
    PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 134 - 141
  • [30] Self-Supervised Learning of Monocular Depth Estimation Based on Progressive Strategy
    Wang, Huachun
    Sang, Xinzhu
    Chen, Duo
    Wang, Peng
    Yan, Binbin
    Qi, Shuai
    Ye, Xiaoqian
    Yao, Tong
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2021, 7 : 375 - 383