SVT-SDE: Spatiotemporal Vision Transformers-Based Self-Supervised Depth Estimation in Stereoscopic Surgical Videos

被引:6
|
作者
Tao, Rong [1 ]
Huang, Baoru [2 ]
Zou, Xiaoyang [1 ]
Zheng, Guoyan [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Biomed Engn, Inst Med Robot, Shanghai 200240, Peoples R China
[2] Imperial Coll London, Hamlyn Ctr Robot Surg, Dept Surg & Canc, London SW7 2AZ, England
来源
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS | 2023年 / 5卷 / 01期
基金
中国国家自然科学基金;
关键词
Estimation; Image reconstruction; Videos; Surgery; Spatiotemporal phenomena; Feature extraction; Cameras; Depth estimation; surgical videos; spatiotemporal vision transformers; unsupervised; DEFORMATION RECOVERY; RECONSTRUCTION; NETWORKS; SURGERY;
D O I
10.1109/TMRB.2023.3237867
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Dense depth estimation plays a crucial role in developing context-aware computer-assisted intervention systems. However, it is a challenging task due to low image quality and highly dynamic surgical environment. The task is further complicated by the difficulty in acquiring per-pixel ground truth depth data in a surgical setting. Recent works on self-supervised depth estimation use image reconstruction (i.e., the warped images) as supervisory signal, which helps to eliminate the requirement of ground truth depth annotations but also causes over-smoothed depth predictions. Additionally, most existing depth estimation methods are built upon static laparoscopic images, ignoring rich temporal information. To address these challenges, we propose a novel spatiotemporal vision transformers-based self-supervised depth estimation method, referred as SVT-SDE. Unlike previous works, SVT-SDE features a novel spatiotemporal vision transformers (SVT) architecture, which can learn complementary visual and temporal information from the input stereoscopic video clips. We further introduce high-frequency-based supervisory signal, which helps to preserve fine-grained details of depth estimation. Results from experiments conducted on two publicly available datasets demonstrate the superior performance of SVT-SDE over the state-of-the-art self-supervised depth estimation methods.
引用
收藏
页码:42 / 53
页数:12
相关论文
共 50 条
  • [31] Depth estimation algorithm of monocular image based on self-supervised learning
    Bai L.
    Liu L.-J.
    Li X.-A.
    Wu S.
    Liu R.-Q.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2023, 53 (04): : 1139 - 1145
  • [32] TinyDepth: Lightweight self-supervised monocular depth estimation based on transformer
    Cheng, Zeyu
    Zhang, Yi
    Yu, Yang
    Song, Zhe
    Tang, Chengkai
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
  • [33] Self-Supervised Monocular Depth Estimation With Isometric-Self-Sample-Based Learning
    Cha, Geonho
    Jang, Ho-Deok
    Wee, Dongyoon
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (04) : 2173 - 2180
  • [34] GCNDepth: Self-supervised monocular depth estimation based on graph convolutional network
    Masoumian, Armin
    Rashwan, Hatem A.
    Abdulwahab, Saddam
    Cristiano, Julian
    Asif, M. Salman
    Puig, Domenec
    NEUROCOMPUTING, 2023, 517 : 81 - 92
  • [35] Self-Supervised Monocular Depth Estimation Based on Full Scale Feature Fusion
    Wang C.
    Chen Y.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (05): : 667 - 675
  • [36] Indoor self-supervised monocular depth estimation based on level feature fusion
    Cheng D.
    Zhang H.
    Kou Q.
    Lü C.
    Qian J.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2023, 31 (20): : 2993 - 3009
  • [37] Depth Estimation of Monocular PCB Image Based on Self-Supervised Convolution Network
    Huang, Zedong
    Gu, Jinan
    Li, Jing
    Li, Shuwei
    Hu, Junjie
    ELECTRONICS, 2022, 11 (12)
  • [38] Self-supervised Monocular Depth Estimation Method Based on Piecewise Plane Model
    Zhang, Weiwei
    Zhang, Guanwen
    Zhou, Wei
    2024 IEEE 19TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, ICIEA 2024, 2024,
  • [39] Self-supervised monocular depth estimation based on image texture detail enhancement
    Yuanzhen Li
    Fei Luo
    Wenjie Li
    Shenjie Zheng
    Huan-huan Wu
    Chunxia Xiao
    The Visual Computer, 2021, 37 : 2567 - 2580
  • [40] Self-supervised monocular depth estimation based on combining convolution and multilayer perceptron
    Zheng, Qiumei
    Yu, Tao
    Wang, Fenghua
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117