Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects

被引：46

作者：

Zhang, Shiqing ^{[1
]}

Yang, Yijiao ^{[1
]}

Chen, Chen ^{[1
]}

Zhang, Xingnan ^{[1
]}

Leng, Qingming ^{[2
]}

Zhao, Xiaoming ^{[1
]}

机构：

[1] Taizhou Univ, Inst Intelligent Informat Proc, Taizhou 318000, Zhejiang, Peoples R China

[2] Jiujiang Univ, Sch Elect & Informat Engn, Jiujiang 332005, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 237卷

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Multimodal emotion recognition; Deep learning; Feature extraction; Multimodal information fusion; review; FACIAL EXPRESSION RECOGNITION; INFORMATION FUSION; AFFECTIVE FEATURES; SENTIMENT ANALYSIS; NEURAL-NETWORKS; SPEECH; DATABASES; MODEL; DIMENSIONALITY; SIGNALS;

D O I：

10.1016/j.eswa.2023.121692

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Emotion recognition has recently attracted extensive interest due to its significant applications to human-computer interaction. The expression of human emotion depends on various verbal and non-verbal languages like audio, visual, text, etc. Emotion recognition is thus well suited as a multimodal rather than single-modal learning problem. Owing to the powerful feature learning capability, extensive deep learning methods have been recently leveraged to capture high-level emotional feature representations for multimodal emotion recognition (MER). Therefore, this paper makes the first effort in comprehensively summarize recent advances in deep learning-based multimodal emotion recognition (DL-MER) involved in audio, visual, and text modalities. We focus on: (1) MER milestones are given to summarize the development tendency of MER, and conventional multimodal emotional datasets are provided; (2) The core principles of typical deep learning models and its recent advancements are overviewed; (3) A systematic survey and taxonomy is provided to cover the state-of-theart methods related to two key steps in a MER system, including feature extraction and multimodal information fusion; (4) The research challenges and open issues in this field are discussed, and promising future directions are given.

引用

页数：23

共 50 条

[31] A Systematic Review on Recent Advancements in Deep and Machine Learning Based Detection and Classification of Acute Lymphoblastic Leukemia
Das, Pradeep Kumar
Diya, V.A.
Meher, Sukadev
Panda, Rutuparna
Abraham, Ajith
IEEE Access, 2022, 10 : 81741 - 81763
[32] Text to Speech Synthesis: A Systematic Review, Deep Learning Based Architecture and Future Research Direction
Khanam, Fahima
Munmun, Farha Akhter
Ritu, Nadia Afrin
Saha, Aloke Kumar
Mridha, Muhammad Firoz
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (05) : 398 - 412
[33] Systematic review of recent years: machine learning-based interactive therapy for people suffering from dementia
Rohrer, Coralie
Ben Souissi, Souhir
Kurpicz-Briki, Mascha
ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (03)
[34] A review of deep learning-based multiple-lesion recognition from medical images: classification, detection and segmentation
Jiang, Huiyan
Diao, Zhaoshuo
Shi, Tianyu
Zhou, Yang
Wang, Feiyu
Hu, Wenrui
Zhu, Xiaolin
Luo, Shijie
Tong, Guoyu
Yao, Yu-Dong
COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 157
[35] A systematic literature review of deep learning-based text summarization: Techniques, input representation, training strategies, mechanisms, datasets, evaluation, and challenges
Saleh, Marwa E.
Wazery, Yaser M.
Ali, Abdelmgeid A.
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
[36] DREAM: Deep Learning-Based Recognition of Emotions From Multiple Affective Modalities Using Consumer-Grade Body Sensors and Video Cameras
Sharma, Aditi
Kumar, Akshi
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 1434 - 1442
[37] Machine Learning-Based Automated Diagnostic Systems Developed for Heart Failure Prediction Using Different Types of Data Modalities: A Systematic Review and Future Directions
Javeed, Ashir
Khan, Shafqat Ullah
Ali, Liaqat
Ali, Sardar
Imrana, Yakubu
Rahman, Atiqur
Computational and Mathematical Methods in Medicine, 2022, 2022
[38] A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos
Qi, Shubao
Liu, Baolin
PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1493 - 1503
[39] A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videos
Shi, Congbao
Zhang, Yuanyuan
Liu, Baolin
APPLIED INTELLIGENCE, 2024, 54 (04) : 3040 - 3057
[40] A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos
Shubao Qi
Baolin Liu
Pattern Analysis and Applications, 2023, 26 (3) : 1493 - 1503

← 1 2 3 4 5 →