Multi-Modal Emotion Recognition Fusing Video and Audio

被引:4
|
作者
Xu, Chao [1 ]
Du, Pufeng [2 ]
Feng, Zhiyong [2 ]
Meng, Zhaopeng [1 ]
Cao, Tianyi [2 ]
Dong, Caichao [2 ]
机构
[1] Tianjin Univ, Sch Comp Software, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300072, Peoples R China
来源
基金
美国国家科学基金会;
关键词
Emotion Recognition; Multi-modal Fusion; HMM; Multi-layer Perceptron;
D O I
10.12785/amis/070205
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Emotion plays an important role in human communications. We construct a framework for multi-modal fusion emotion recognition. Facial expression features and speech features are respectively extracted from image sequences and speech signals. In order to locate and track facial feature points, we construct an Active Appearance Model for facial images with all kinds of expressions. Facial Animation Parameters are calculated from motions of facial feature points as expression features. We extract short-term mean energy, fundamental frequency and formant frequencies from each frame as speech features. An emotion classifier is designed to fuse facial expression and speech based on Hidden Markov Models and Multi-layer Perceptron. Experiments indicate that multi-modal fusion emotion recognition algorithm which is presented in this paper has relatively high recognition accuracy. The proposed approach has better performance and robustness than methods using only video or audio separately.
引用
收藏
页码:455 / 462
页数:8
相关论文
共 50 条
  • [21] Erratum to: Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video
    GOUTSU Yusuke
    KOBAYASHI Takaki
    OBARA Junya
    KUSAJIMA Ikuo
    TAKEICHI Kazunari
    TAKANO Wataru
    NAKAMURA Yoshihiko
    Chinese Journal of Mechanical Engineering, 2017, 30 : 1473 - 1473
  • [22] A Multi-modal Gesture Recognition System Using Audio, Video, and Skeletal Joint Data
    Nandakumar, Karthik
    Wah, Wan Kong
    Alice, Chan Siu Man
    Terence, Ng Wen Zheng
    Gang, Wang Jian
    Yun, Yau Wei
    ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 475 - 482
  • [23] Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition
    Nie, Weizhi
    Yan, Yan
    Song, Dan
    Wang, Kun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16205 - 16214
  • [24] Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition
    Weizhi Nie
    Yan Yan
    Dan Song
    Kun Wang
    Multimedia Tools and Applications, 2021, 80 : 16205 - 16214
  • [25] Multi-Modal Multi-Action Video Recognition
    Shi, Zhensheng
    Liang, Ju
    Li, Qianqian
    Zheng, Haiyong
    Gu, Zhaorui
    Dong, Junyu
    Zheng, Bing
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13658 - 13667
  • [26] Multi-modal Laughter Recognition in Video Conversations
    Escalera, Sergio
    Puertas, Eloi
    Radeva, Petia
    Pujol, Oriol
    2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 869 - 874
  • [27] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [28] A Multi-Modal Deep Learning Approach for Emotion Recognition
    Shahzad, H. M.
    Bhatti, Sohail Masood
    Jaffar, Arfan
    Rashid, Muhammad
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02): : 1561 - 1570
  • [29] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
    Priyasad, Darshana
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
  • [30] Multi-modal Correlated Network for emotion recognition in speech
    Ren, Minjie
    Nie, Weizhi
    Liu, Anan
    Su, Yuting
    VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155