Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition

被引:64
|
作者
Zhang, Shiqing [1 ,2 ]
Zhang, Shiliang [1 ]
Huang, Tiejun [1 ]
Gao, Wen [1 ]
机构
[1] Peking Univ, Sch EE&CS, Beijing, Peoples R China
[2] Taizhou Univ, Inst Intelligent Informat Proc, Taizhou, Peoples R China
关键词
Emotion recognition; Multimodal deep learning; Deep convolution neural network;
D O I
10.1145/2911996.2912051
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Emotion recognition is a challenging task because of the emotional gap between subjective emotion and the low-level audio-visual features. Inspired by the recent success of deep learning in bridging the semantic gap, this paper proposes to bridge the emotional gap based on a multimodal Deep Convolution Neural Network (DCNN), which fuses the audio and visual cues in a deep model. This multimodal DCNN is trained with two stages. First, two DCNN models pre-trained on large-scale image data are fine-tuned to perform audio and visual emotion recognition tasks respectively on the corresponding labeled speech and face data. Second, the outputs of these two DCNNs are integrated in a fusion network constructed by a number of fully-connected layers. The fusion network is trained to obtain a joint audio-visual feature representation for emotion recognition. Experimental results on the RML audio-visual database demonstrates the promising performance of the proposed method. To the best of our knowledge, this is an early work fusing audio and visual cues in DCNN for emotion recognition. Its success guarantees further research in this direction.
引用
收藏
页码:281 / 284
页数:4
相关论文
共 50 条
  • [41] Audio-visual emotion recognition with multilayer boosted HMM
    Lü, Kun
    Jia, Yun-De
    Zhang, Xin
    Lü, K. (kunlv@bit.edu.cn), 1600, Beijing Institute of Technology (22): : 89 - 93
  • [42] Fusion of Classifier Predictions for Audio-Visual Emotion Recognition
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 61 - 66
  • [43] Audio-visual emotion recognition with multilayer boosted HMM
    吕坤
    贾云得
    张欣
    JournalofBeijingInstituteofTechnology, 2013, 22 (01) : 89 - 93
  • [44] Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network
    Zhang, Yong
    Cheng, Cheng
    Zhang, Yidie
    IEEE ACCESS, 2021, 9 : 7943 - 7951
  • [45] Jointly Learning From Unimodal and Multimodal-Rated Labels in Audio-Visual Emotion Recognition
    Goncalves, Lucas
    Chou, Huang-Cheng
    Salman, Ali N.
    Lee, Chi-Chun
    Busso, Carlos
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2025, 6 : 165 - 174
  • [46] Audio-Visual Speech Emotion Recognition by Disentangling Emotion and Identity Attributes
    Ito, Koichiro
    Fujioka, Takuya
    Sun, Qinghua
    Nagamatsu, Kenji
    INTERSPEECH 2021, 2021, : 4493 - 4497
  • [47] Deep convolutional neural network architecture for facial emotion recognition
    Pruthviraja, Dayananda
    Kumar, Ujjwal Mohan
    Parameswaran, Sunil
    Chowdary, Vemulapalli Guna
    Bharadwaj, Varun
    PEERJ COMPUTER SCIENCE, 2024, 10 : 1 - 20
  • [48] Recognition of emotion in music based on deep convolutional neural network
    Rajib Sarkar
    Sombuddha Choudhury
    Saikat Dutta
    Aneek Roy
    Sanjoy Kumar Saha
    Multimedia Tools and Applications, 2020, 79 : 765 - 783
  • [49] Recognition of emotion in music based on deep convolutional neural network
    Sarkar, Rajib
    Choudhury, Sombuddha
    Dutta, Saikat
    Roy, Aneek
    Saha, Sanjoy Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (1-2) : 765 - 783
  • [50] Facial Emotion Recognition Using Deep Convolutional Neural Network
    Pranav, E.
    Kamal, Suraj
    Chandran, Satheesh C.
    Supriya, M. H.
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 317 - 320