MM-EMOR: Multi-Modal Emotion Recognition of Social Media Using Concatenated Deep Learning Networks

被引：1

作者：

Adel, Omar ^{[1
]}

Fathalla, Karma M. ^{[1
]}

Abo ElFarag, Ahmed ^{[1
]}

机构：

[1] Arab Acad Sci Technol & Maritime Transport AAST, Fac Engn & Technol, Dept Comp Engn, Alexandria, Egypt

来源：

BIG DATA AND COGNITIVE COMPUTING | 2023年 / 7卷 / 04期

关键词：

classification; MobileNet; Roberta; multimodal; emotion; recognition; IEMOCAP; MELD; social media; FEATURES;

D O I：

10.3390/bdcc7040164

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Emotion recognition is crucial in artificial intelligence, particularly in the domain of human-computer interaction. The ability to accurately discern and interpret emotions plays a critical role in helping machines to effectively decipher users' underlying intentions, allowing for a more streamlined interaction process that invariably translates into an elevated user experience. The recent increase in social media usage, as well as the availability of an immense amount of unstructured data, has resulted in a significant demand for the deployment of automated emotion recognition systems. Artificial intelligence (AI) techniques have emerged as a powerful solution to this pressing concern in this context. In particular, the incorporation of multimodal AI-driven approaches for emotion recognition has proven beneficial in capturing the intricate interplay of diverse human expression cues that manifest across multiple modalities. The current study aims to develop an effective multimodal emotion recognition system known as MM-EMOR in order to improve the efficacy of emotion recognition efforts focused on audio and text modalities. The use of Mel spectrogram features, Chromagram features, and the Mobilenet Convolutional Neural Network (CNN) for processing audio data are central to the operation of this system, while an attention-based Roberta model caters to the text modality. The methodology of this study is based on an exhaustive evaluation of this approach across three different datasets. Notably, the empirical findings show that MM-EMOR outperforms competing models across the same datasets. This performance boost is noticeable, with accuracy gains of an impressive 7% on one dataset and a substantial 8% on another. Most significantly, the observed increase in accuracy for the final dataset was an astounding 18%.

引用

页数：21

共 50 条

[1] A Multi-Modal Deep Learning Approach for Emotion Recognition
Shahzad, H. M.
Bhatti, Sohail Masood
Jaffar, Arfan
Rashid, Muhammad
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02): : 1561 - 1570
[2] A multi-modal deep learning system for Arabic emotion recognition
Abu Shaqra F.
Duwairi R.
Al-Ayyoub M.
International Journal of Speech Technology, 2023, 26 (01) : 123 - 139
[3] Multi-modal Feature Fistillation Emotion Recognition Method For Social Media
Chang, Xue
Wang, Mingjiang
Deng, Xiao
2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 445 - 454
[4] Multi-Modal Emotion Recognition Based On deep Learning Of EEG And Audio Signals
Li, Zhongjie
Zhang, Gaoyan
Dang, Jianwu
Wang, Longbiao
Wei, Jianguo
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[5] Multi-modal embeddings using multi-task learning for emotion recognition
Khare, Aparna
Parthasarathy, Srinivas
Sundaram, Shiva
INTERSPEECH 2020, 2020, : 384 - 388
[6] Multi-modal deep learning for landform recognition
Du, Lin
You, Xiong
Li, Ke
Meng, Liqiu
Cheng, Gong
Xiong, Liyang
Wang, Guangxia
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 158 : 63 - 75
[7] Multi-Modal Emotion Recognition From Speech and Facial Expression Based on Deep Learning
Cai, Linqin
Dong, Jiangong
Wei, Min
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5726 - 5729
[8] Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning
Liu, Dong
Wang, Zhiyong
Wang, Lifeng
Chen, Longxi
FRONTIERS IN NEUROROBOTICS, 2021, 15
[9] Facial emotion recognition using multi-modal information
De Silva, LC
Miyasato, T
Nakatsu, R
ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 397 - 401
[10] A comprehensive framework for multi-modal hate speech detection in social media using deep learning
R. Prabhu
V. Seethalakshmi
Scientific Reports, 15 (1)

← 1 2 3 4 5 →