Deep Multimodal Fusion for Surgical Feedback Classification

被引:0
|
作者
Kocielnik, Rafal [1 ]
Wong, Elyssa Y. [2 ]
Chu, Timothy N. [2 ]
Lin, Lydia [1 ,2 ]
Huang, De-An [3 ]
Wang, Jiayun [1 ]
Anandkumar, Anima [1 ]
Hung, Andrew J. [4 ]
机构
[1] CALTECH, Pasadena, CA 91125 USA
[2] Univ Southern Calif, Los Angeles, CA USA
[3] NVIDIA, Santa Clara, CA USA
[4] Cedars Sinai Med Ctr, Los Angeles, CA USA
基金
美国国家卫生研究院;
关键词
Surgical feedback; Multimodality; Robot-Assisted Surgery; Deep Learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quantification of real-time informal feedback delivered by an experienced surgeon to a trainee during surgery is important for skill improvements in surgical training. Such feedback in the live operating room is inherently multimodal, consisting of verbal conversations (e.g., questions and answers) as well as non-verbal elements (e.g., through visual cues like pointing to anatomic elements). In this work, we leverage a clinically-validated five-category classification of surgical feedback: "Anatomic", "Technical", "Procedural", "Praise" and "Visual Aid". We then develop a multi-label machine learning model to classify these five categories of surgical feedback from inputs of text, audio, and video modalities. The ultimate goal of our work is to help automate the annotation of real-time contextual surgical feedback at scale. Our automated classification of surgical feedback achieves AUCs ranging from 71.5 to 77.6 with the fusion improving performance by 3.1%. We also show that high-quality manual transcriptions of feedback audio from experts improve AUCs to between 76.5 and 96.2, which demonstrates a clear path toward future improvements. Empirically, we find that the Staged training strategy, with first pre-training each modality separately and then training them jointly, is more effective than training different modalities altogether. We also present intuitive findings on the importance of modalities for different feedback categories. This work offers an important first look at the feasibility of automated classification of real-world live surgical feedback based on text, audio, and video modalities.
引用
收藏
页码:256 / 267
页数:12
相关论文
共 50 条
  • [21] Deep Multimodal Fusion for Persuasiveness Prediction
    Nojavanasghari, Behnaz
    Gopinath, Deepak
    Koushik, Jayanth
    Baltrusaitis, Tadas
    Morency, Louis-Philippe
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 284 - 288
  • [22] Deep Multimodal Fusion: A Hybrid Approach
    Amer, Mohamed R.
    Shields, Timothy
    Siddiquie, Behjat
    Tamrakar, Amir
    Divakaran, Ajay
    Chai, Sek
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018, 126 (2-4) : 440 - 456
  • [23] Deep Multimodal Fusion: A Hybrid Approach
    Mohamed R. Amer
    Timothy Shields
    Behjat Siddiquie
    Amir Tamrakar
    Ajay Divakaran
    Sek Chai
    International Journal of Computer Vision, 2018, 126 : 440 - 456
  • [24] A review of deep learning-based information fusion techniques for multimodal medical image classification
    Li Y.
    El Habib Daho M.
    Conze P.-H.
    Zeghlache R.
    Le Boité H.
    Tadayoni R.
    Cochener B.
    Lamard M.
    Quellec G.
    Computers in Biology and Medicine, 2024, 177
  • [25] Multimodal Data Fusion and Deep Learning for Occupant-Centric Indoor Environmental Quality Classification
    Lee, Min Jae
    Zhang, Ruichuan
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2025, 39 (02)
  • [26] Deep learning-based late fusion of multimodal information for emotion classification of music video
    Pandeya, Yagya Raj
    Lee, Joonwhoan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (02) : 2887 - 2905
  • [27] Deep learning-based late fusion of multimodal information for emotion classification of music video
    Yagya Raj Pandeya
    Joonwhoan Lee
    Multimedia Tools and Applications, 2021, 80 : 2887 - 2905
  • [28] ECG Heartbeat Classification Using Multimodal Fusion
    Ahmad, Zeeshan
    Tabassum, Anika
    Guan, Ling
    Khan, Naimul Mefraz
    IEEE ACCESS, 2021, 9 : 100615 - 100626
  • [29] Multimodal Keyless Attention Fusion for Video Classification
    Long, Xiang
    Gan, Chuang
    de Melo, Gerard
    Liu, Xiao
    Li, Yandong
    Li, Fu
    Wen, Shilei
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7202 - 7209
  • [30] CLASSIFICATION OF BREAST CANCER IN MRI WITH MULTIMODAL FUSION
    Morais, Margarida
    Calisto, Francisco Maria
    Santiago, Carlos
    Aleluia, Clara
    Nascimento, Jacinto C.
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,