Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding

被引:7
|
作者
Byun, Sung-Woo [1 ]
Kim, Ju-Hee [1 ]
Lee, Seok-Pil [2 ]
机构
[1] SangMyung Univ, Grad Sch, Dept Comp Sci, Seoul 03016, South Korea
[2] SangMyung Univ, Dept Elect Engn, Seoul 03016, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 17期
关键词
speech emotion recognition; emotion recognition; multi-modal emotion recognition;
D O I
10.3390/app11177967
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, intelligent personal assistants, chat-bots and AI speakers are being utilized more broadly as communication interfaces and the demands for more natural interaction measures have increased as well. Humans can express emotions in various ways, such as using voice tones or facial expressions; therefore, multimodal approaches to recognize human emotions have been studied. In this paper, we propose an emotion recognition method to deliver more accuracy by using speech and text data. The strengths of the data are also utilized in this method. We conducted 43 feature vectors such as spectral features, harmonic features and MFCC from speech datasets. In addition, 256 embedding vectors from transcripts using pre-trained Tacotron encoder were extracted. The acoustic feature vectors and embedding vectors were fed into each deep learning model which produced a probability for the predicted output classes. The results show that the proposed model exhibited more accurate performance than in previous research.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Intelligent ear for emotion recognition: Multi-modal emotion recognition via acoustic features, semantic contents and facial images
    Wu, CH
    Chuang, ZJ
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XII, PROCEEDINGS: APPLICATIONS OF CYBERNETICS AND INFORMATICS IN OPTICS, SIGNALS, SCIENCE AND ENGINEERING, 2004, : 122 - 127
  • [32] SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings
    Nhat Truong Pham
    Duc Ngoc Minh Dang
    Bich Ngoc Hong Pham
    Sy Dzung Nguyen
    PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2023, 2023, : 234 - 238
  • [33] Multi-Modal Emotion Recognition Fusing Video and Audio
    Xu, Chao
    Du, Pufeng
    Feng, Zhiyong
    Meng, Zhaopeng
    Cao, Tianyi
    Dong, Caichao
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
  • [34] A Multi-Modal Deep Learning Approach for Emotion Recognition
    Shahzad, H. M.
    Bhatti, Sohail Masood
    Jaffar, Arfan
    Rashid, Muhammad
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02): : 1561 - 1570
  • [35] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
    Priyasad, Darshana
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
  • [36] Multi-modal Emotion Recognition for Determining Employee Satisfaction
    Zaman, Farhan Uz
    Zaman, Maisha Tasnia
    Alam, Md Ashraful
    Alam, Md Golam Rabiul
    2021 IEEE ASIA-PACIFIC CONFERENCE ON COMPUTER SCIENCE AND DATA ENGINEERING (CSDE), 2021,
  • [37] Real-time emotion detection system using speech: Multi-modal fusion of different timescale features
    Kim, Samuel
    Georgiou, Panayiotis G.
    Lee, Sungbok
    Narayanan, Shrikanth
    2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 48 - 51
  • [38] Semantic Alignment Network for Multi-Modal Emotion Recognition
    Hou, Mixiao
    Zhang, Zheng
    Liu, Chang
    Lu, Guangming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329
  • [39] Emotion recognition with multi-modal peripheral physiological signals
    Gohumpu, Jennifer
    Xue, Mengru
    Bao, Yanchi
    FRONTIERS IN COMPUTER SCIENCE, 2023, 5
  • [40] Fusing Multi-modal Features for Gesture Recognition
    Wu, Jiaxiang
    Cheng, Jian
    Zhao, Chaoyang
    Lu, Hanqing
    ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 453 - 459