Audio-Guided Fusion Techniques for Multimodal Emotion Analysis

被引:0
|
作者
Shi, Pujin [1 ]
Gao, Fei [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, State Key Lab Networking & Switching Technol, Beijing, Peoples R China
关键词
Multimodal emotion recognition; Multimodal feature fusion; Self-supervised learning; RECOGNITION; SLEEP;
D O I
10.1145/3689092.3689414
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a solution for the semi-supervised learning track (MER-SEMI) in MER2024. First, in order to enhance the performance of the feature extractor on sentiment classification tasks, we fine-tuned video and text feature extractors, specifically CLIP-vit-large and Baichuan-13B, using labeled data. This approach effectively preserves the original emotional information conveyed in the videos. Second, we propose an Audio-Guided Transformer (AGT) fusion mechanism, which leverages the robustness of Hubert-large, showing superior effectiveness in fusing both inter-channel and intra-channel information. Third, To enhance the accuracy of the model, we iteratively apply self-supervised learning by using high-confidence unlabeled data as pseudo-labels. Finally, through black-box probing, we discovered an imbalanced data distribution between the training and test sets. Therefore, We adopt a prior-knowledge-based voting mechanism. The results demonstrate the effectiveness of our strategy, ultimately earning us third place in the MER-SEMI track.
引用
收藏
页码:62 / 66
页数:5
相关论文
共 50 条
  • [41] A randomized trial of music versus audio-guided relaxation training to decrease blood pressure in an elderly population
    Tang, Hsin-Yi
    Harms, Verna
    Speck, Sarah M.
    Vezeau, Toni
    HYPERTENSION, 2008, 52 (04) : E63 - E63
  • [42] Audio-Guided Imagery Positively Impacted Patients with Raynaud's Phenomenon Associated Connective Tissue Disease
    Padilla, Cristina
    Katikineni, Veena
    Park, Yongseok
    Freno, Leigh
    Laffoon, Maureen
    Domsic, Robyn
    ARTHRITIS & RHEUMATOLOGY, 2024, 76 : 5019 - 5021
  • [43] Multimodal (Audio, Facial and Gesture) based Emotion Recognition challenge
    Wei, Gou
    Li Jian
    Mo, Sun
    2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 908 - 911
  • [44] Feature Fusion for Multimodal Emotion Recognition Based on Deep Canonical Correlation Analysis
    Zhang, Ke
    Li, Yuanqing
    Wang, Jingyu
    Wang, Zhen
    Li, Xuelong
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1898 - 1902
  • [45] DGFN Multimodal Emotion Analysis Model Based on Dynamic Graph Fusion Network
    Li, Jingwei
    Bai, Xinyi
    Han, Zhaoming
    INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2024, 16 (01)
  • [46] Multimodal Emotion Recognition Fusion Analysis Adapting BERT With Heterogeneous Feature Unification
    Lee, Sanghyun
    Han, David K.
    Ko, Hanseok
    IEEE ACCESS, 2021, 9 : 94557 - 94572
  • [47] Emotion Recognition Based on Feedback Weighted Fusion of Multimodal Emotion Data
    Wei, Wei
    Jia, Qingxuan
    Feng, Yongli
    2017 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE ROBIO 2017), 2017, : 1682 - 1687
  • [48] Late multimodal fusion for image and audio music transcription
    Alfaro-Contreras, Maria
    Valero-Mas, Jose J.
    Inesta, Jose M.
    Calvo-Zaragoza, Jorge
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 216
  • [49] Multimodal Emotion Recognition and State Analysis of Classroom Video and Audio Based on Deep Neural Network
    Li, Mingyong
    Liu, Mingyue
    Jiang, Zheng
    Zhao, Zongwei
    Zhang, Jiayan
    Ge, Mingyuan
    Duan, Huiming
    Wang, Yanxia
    JOURNAL OF INTERCONNECTION NETWORKS, 2022, 22 (SUPP04)
  • [50] Kernel Fusion of Audio and Visual Information for Emotion Recognition
    Wang, Yongjin
    Zhang, Rui
    Guan, Ling
    Venetsanopoulos, A. N.
    IMAGE ANALYSIS AND RECOGNITION: 8TH INTERNATIONAL CONFERENCE, ICIAR 2011, PT II: 8TH INTERNATIONAL CONFERENCE, ICIAR 2011, 2011, 6754 : 140 - 150