SVT-SDE: Spatiotemporal Vision Transformers-Based Self-Supervised Depth Estimation in Stereoscopic Surgical Videos

被引:6
|
作者
Tao, Rong [1 ]
Huang, Baoru [2 ]
Zou, Xiaoyang [1 ]
Zheng, Guoyan [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Biomed Engn, Inst Med Robot, Shanghai 200240, Peoples R China
[2] Imperial Coll London, Hamlyn Ctr Robot Surg, Dept Surg & Canc, London SW7 2AZ, England
来源
基金
中国国家自然科学基金;
关键词
Estimation; Image reconstruction; Videos; Surgery; Spatiotemporal phenomena; Feature extraction; Cameras; Depth estimation; surgical videos; spatiotemporal vision transformers; unsupervised; DEFORMATION RECOVERY; RECONSTRUCTION; NETWORKS; SURGERY;
D O I
10.1109/TMRB.2023.3237867
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Dense depth estimation plays a crucial role in developing context-aware computer-assisted intervention systems. However, it is a challenging task due to low image quality and highly dynamic surgical environment. The task is further complicated by the difficulty in acquiring per-pixel ground truth depth data in a surgical setting. Recent works on self-supervised depth estimation use image reconstruction (i.e., the warped images) as supervisory signal, which helps to eliminate the requirement of ground truth depth annotations but also causes over-smoothed depth predictions. Additionally, most existing depth estimation methods are built upon static laparoscopic images, ignoring rich temporal information. To address these challenges, we propose a novel spatiotemporal vision transformers-based self-supervised depth estimation method, referred as SVT-SDE. Unlike previous works, SVT-SDE features a novel spatiotemporal vision transformers (SVT) architecture, which can learn complementary visual and temporal information from the input stereoscopic video clips. We further introduce high-frequency-based supervisory signal, which helps to preserve fine-grained details of depth estimation. Results from experiments conducted on two publicly available datasets demonstrate the superior performance of SVT-SDE over the state-of-the-art self-supervised depth estimation methods.
引用
收藏
页码:42 / 53
页数:12
相关论文
共 50 条
  • [1] Exploring Efficiency of Vision Transformers for Self-Supervised Monocular Depth Estimation
    Karpov, Aleksei
    Makarov, Ilya
    2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR 2022), 2022, : 711 - 719
  • [2] Adaptive Self-supervised Depth Estimation in Monocular Videos
    Mendoza, Julio
    Pedrini, Helio
    IMAGE AND GRAPHICS (ICIG 2021), PT III, 2021, 12890 : 687 - 699
  • [3] Self-Supervised Human Depth Estimation from Monocular Videos
    Tan, Feitong
    Zhu, Hao
    Cui, Zhaopeng
    Zhu, Siyu
    Pollefeys, Marc
    Tan, Ping
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 647 - 656
  • [4] Spatially variant biases considered self-supervised depth estimation based on laparoscopic videos
    Li, Wenda
    Hayashi, Yuichiro
    Oda, Masahiro
    Kitasaka, Takayuki
    Misawa, Kazunari
    Mori, Kensaku
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2022, 10 (03): : 274 - 282
  • [5] Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics
    Varma, Arnav
    Chawla, Hemang
    Zonooz, Bahram
    Arani, Elahe
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 4, 2022, : 758 - 769
  • [6] SELF-SUPERVISED DEPTH ESTIMATION VIA IMPLICIT CUES FROM VIDEOS
    Wang, Jianrong
    Zhang, Ge
    Wu, Zhenyu
    Li, Xuewei
    Liu, Li
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2485 - 2489
  • [7] Self-supervised monocular depth estimation from oblique UAV videos
    Madhuanand, Logambal
    Nex, Francesco
    Yang, Michael Ying
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 176 : 1 - 14
  • [8] Depth Estimation for Colonoscopy Images with Self-supervised Learning from Videos
    Cheng, Kai
    Ma, Yiting
    Sun, Bin
    Li, Yang
    Chen, Xuejin
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VI, 2021, 12906 : 119 - 128
  • [9] MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer
    Zhao, Chaoqiang
    Zhang, Youmin
    Poggi, Matteo
    Tosi, Fabio
    Guo, Xianda
    Zhu, Zheng
    Huang, Guan
    Tang, Yang
    Mattoccia, Stefano
    2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 668 - 678
  • [10] TSD-Depth: Using transformers and self-distilling for self-supervised indoor depth estimation
    Lv C.
    Han C.
    Chen J.
    Cheng D.
    Qian J.
    Optik, 2023, 288