SVT-SDE: Spatiotemporal Vision Transformers-Based Self-Supervised Depth Estimation in Stereoscopic Surgical Videos

被引:6
|
作者
Tao, Rong [1 ]
Huang, Baoru [2 ]
Zou, Xiaoyang [1 ]
Zheng, Guoyan [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Biomed Engn, Inst Med Robot, Shanghai 200240, Peoples R China
[2] Imperial Coll London, Hamlyn Ctr Robot Surg, Dept Surg & Canc, London SW7 2AZ, England
来源
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS | 2023年 / 5卷 / 01期
基金
中国国家自然科学基金;
关键词
Estimation; Image reconstruction; Videos; Surgery; Spatiotemporal phenomena; Feature extraction; Cameras; Depth estimation; surgical videos; spatiotemporal vision transformers; unsupervised; DEFORMATION RECOVERY; RECONSTRUCTION; NETWORKS; SURGERY;
D O I
10.1109/TMRB.2023.3237867
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Dense depth estimation plays a crucial role in developing context-aware computer-assisted intervention systems. However, it is a challenging task due to low image quality and highly dynamic surgical environment. The task is further complicated by the difficulty in acquiring per-pixel ground truth depth data in a surgical setting. Recent works on self-supervised depth estimation use image reconstruction (i.e., the warped images) as supervisory signal, which helps to eliminate the requirement of ground truth depth annotations but also causes over-smoothed depth predictions. Additionally, most existing depth estimation methods are built upon static laparoscopic images, ignoring rich temporal information. To address these challenges, we propose a novel spatiotemporal vision transformers-based self-supervised depth estimation method, referred as SVT-SDE. Unlike previous works, SVT-SDE features a novel spatiotemporal vision transformers (SVT) architecture, which can learn complementary visual and temporal information from the input stereoscopic video clips. We further introduce high-frequency-based supervisory signal, which helps to preserve fine-grained details of depth estimation. Results from experiments conducted on two publicly available datasets demonstrate the superior performance of SVT-SDE over the state-of-the-art self-supervised depth estimation methods.
引用
收藏
页码:42 / 53
页数:12
相关论文
共 50 条
  • [41] Self-supervised monocular depth estimation based on image texture detail enhancement
    Li, Yuanzhen
    Luo, Fei
    Li, Wenjie
    Zheng, Shenjie
    Wu, Huan-huan
    Xiao, Chunxia
    VISUAL COMPUTER, 2021, 37 (9-11): : 2567 - 2580
  • [42] Self-supervised Monocular Depth Estimation Based on Semantic Assistance and Depth Temporal Consistency Constraints
    Ling, Chuanwu
    Chen, Hua
    Xu, Dayong
    Zhang, Xiaogang
    Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2024, 51 (08): : 1 - 12
  • [43] Self-Supervised Monocular Depth Estimation With Frequency-Based Recurrent Refinement
    Li, Rui
    Xue, Danna
    Zhu, Yu
    Wu, Hao
    Sun, Jinqiu
    Zhang, Yanning
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5626 - 5637
  • [44] Parameter search-based scaling network for self-supervised depth estimation
    Xiao, Yuhan
    Sun, Shang
    Liao, TaoLin
    THIRD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION; NETWORK AND COMPUTER TECHNOLOGY (ECNCT 2021), 2022, 12167
  • [45] Self-supervised monocular depth estimation in dynamic scenes based on deep learning
    Cheng, Binbin
    Yu, Ying
    Zhang, Lei
    Wang, Ziquan
    Jiang, Zhipeng
    National Remote Sensing Bulletin, 2024, 28 (09) : 2170 - 2186
  • [46] HQDec: Self-Supervised Monocular Depth Estimation Based on a High-Quality Decoder
    Wang, Fei
    Cheng, Jun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2453 - 2468
  • [47] ATTENTION-BASED SELF-SUPERVISED LEARNING MONOCULAR DEPTH ESTIMATION WITH EDGE REFINEMENT
    Jiang, Chenweinan
    Liu, Haichun
    Li, Lanzhen
    Pan, Changchun
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 3218 - 3222
  • [48] Discriminative-Guided Diffusion-Based Self-supervised Monocular Depth Estimation
    Liu, Runze
    Zhang, Guanghui
    Zhu, Dongchen
    Wang, Lei
    Zhang, Xiaolin
    Li, Jiamao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VI, 2025, 15036 : 328 - 342
  • [49] Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions
    Wang, Xiuling
    Yu, Minglin
    Wang, Haixia
    Lu, Xiao
    Zhang, Zhiguo
    IEEE SENSORS JOURNAL, 2024, 24 (04) : 4978 - 4991
  • [50] LAM-Depth: Laplace-Attention Module-Based Self-Supervised Monocular Depth Estimation
    Wei, Jiansheng
    Pan, Shuguo
    Gao, Wang
    Guo, Peng
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (10) : 13706 - 13716