Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding

被引：7

作者：

Byun, Sung-Woo ^{[1
]}

Kim, Ju-Hee ^{[1
]}

Lee, Seok-Pil ^{[2
]}

机构：

[1] SangMyung Univ, Grad Sch, Dept Comp Sci, Seoul 03016, South Korea

[2] SangMyung Univ, Dept Elect Engn, Seoul 03016, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 17期

关键词：

speech emotion recognition; emotion recognition; multi-modal emotion recognition;

D O I：

10.3390/app11177967

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Recently, intelligent personal assistants, chat-bots and AI speakers are being utilized more broadly as communication interfaces and the demands for more natural interaction measures have increased as well. Humans can express emotions in various ways, such as using voice tones or facial expressions; therefore, multimodal approaches to recognize human emotions have been studied. In this paper, we propose an emotion recognition method to deliver more accuracy by using speech and text data. The strengths of the data are also utilized in this method. We conducted 43 feature vectors such as spectral features, harmonic features and MFCC from speech datasets. In addition, 256 embedding vectors from transcripts using pre-trained Tacotron encoder were extracted. The acoustic feature vectors and embedding vectors were fed into each deep learning model which produced a probability for the predicted output classes. The results show that the proposed model exhibited more accurate performance than in previous research.

引用

页数：9

共 50 条

[31] Intelligent ear for emotion recognition: Multi-modal emotion recognition via acoustic features, semantic contents and facial images
Wu, CH
Chuang, ZJ
8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XII, PROCEEDINGS: APPLICATIONS OF CYBERNETICS AND INFORMATICS IN OPTICS, SIGNALS, SCIENCE AND ENGINEERING, 2004, : 122 - 127
[32] SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings
Nhat Truong Pham
Duc Ngoc Minh Dang
Bich Ngoc Hong Pham
Sy Dzung Nguyen
PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2023, 2023, : 234 - 238
[33] Multi-Modal Emotion Recognition Fusing Video and Audio
Xu, Chao
Du, Pufeng
Feng, Zhiyong
Meng, Zhaopeng
Cao, Tianyi
Dong, Caichao
APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
[34] A Multi-Modal Deep Learning Approach for Emotion Recognition
Shahzad, H. M.
Bhatti, Sohail Masood
Jaffar, Arfan
Rashid, Muhammad
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02): : 1561 - 1570
[35] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
Priyasad, Darshana
Fernando, Tharindu
Denman, Simon
Sridharan, Sridha
Fookes, Clinton
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
[36] Multi-modal Emotion Recognition for Determining Employee Satisfaction
Zaman, Farhan Uz
Zaman, Maisha Tasnia
Alam, Md Ashraful
Alam, Md Golam Rabiul
2021 IEEE ASIA-PACIFIC CONFERENCE ON COMPUTER SCIENCE AND DATA ENGINEERING (CSDE), 2021,
[37] Real-time emotion detection system using speech: Multi-modal fusion of different timescale features
Kim, Samuel
Georgiou, Panayiotis G.
Lee, Sungbok
Narayanan, Shrikanth
2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 48 - 51
[38] Semantic Alignment Network for Multi-Modal Emotion Recognition
Hou, Mixiao
Zhang, Zheng
Liu, Chang
Lu, Guangming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329
[39] Emotion recognition with multi-modal peripheral physiological signals
Gohumpu, Jennifer
Xue, Mengru
Bao, Yanchi
FRONTIERS IN COMPUTER SCIENCE, 2023, 5
[40] Fusing Multi-modal Features for Gesture Recognition
Wu, Jiaxiang
Cheng, Jian
Zhao, Chaoyang
Lu, Hanqing
ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 453 - 459

← 1 2 3 4 5 →