Emotion Recognition Based on Meta Bi-Modal Learning Model

被引：0

作者：

Li Z. ^{[1
]}

Sun Y. ^{[1
]}

Zhang X. ^{[1
]}

Zhou Y. ^{[1
]}

机构：

[1] School of Information and Computer, Taiyuan University of Technology, Taiyuan

来源：

Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications | 2023年 / 46卷 / 05期

关键词：

bi-modal; continuous emotion; cross-modal attention mechanism; discrete emotion; meta learning;

D O I：

10.13190/j.jbupt.2022-220

中图分类号：

学科分类号：

摘要：

Most emotion recognition models were based on single modal speech, which was prone to ambiguity and ignores the relationship between continuous and discrete emotions. To address these issues, a meta bi-modal learning (MBL) model is proposed, which realizes single modal continuous emotion, namely valence, activation and control, to assist in the recognition of bi-modal discrete emotion. The feature fusion method adopts the cross-modal attention mechanism, effectively solving the problem of aligning modal sequence data. At the same time, during the auxiliary task training process, the sharing of hard parameters in multi-task learning achieved three-dimensional information exchange of validity, activation and control. Using each speaker蒺s sentence as a small sample to adapt the learning model to different speakers and enhance their generalization ability. The experimental results show that on the script and dialogue datasets of the interactive emotional dyadic motion capture database, the emotion recognition rates using the MBL model are 71. 24% and 69. 12%, respectively, demonstrating good performance. © 2023 Beijing University of Posts and Telecommunications. All rights reserved.

引用

页码：87 / 105

页数：18

共 15 条

[1] ZHONG P X, WANG D, MIAO C Y., An affect-rich neural conversational model with biased attention and weighted cross-entropy loss, Proceedings of the AAAI Conference on Artificial Intelligence, 33, 1, pp. 7492-7500, (2019)
[2] ABDULLAH S M S A, AMEEN S Y A, SADEEQ M A M, Et al., Multimodal emotion recognition using deep learning, Journal of Applied Science and Technology Trends, 2, 2, pp. 52-58, (2021)
[3] BATBAATAR E, LI M J, RYU K H., Semantic-emotion neural network for emotion recognition from text [ J], IEEE Access, 7, pp. 111866-111878, (2019)
[4] KHALIL R A, JONES E, BABAR M I, Et al., Speech emotion recognition using deep learning techniques: a review, IEEE Access, 7, pp. 117327-117345, (2019)
[5] CAI L Q, DONG J G, WEI M., Multi-modal emotion recognition from speech and facial expression based on deep learning [ C ], 2020 Chinese Automation Congress (CAC), pp. 5726-5729, (2021)
[6] LIN Z J, LONG Y F, DU J C, Et al., A multi-modal sentiment recognition method based on multi-task learning, Acta Scientiarum Naturalium Universitatis Pekinensis, 57, 1, pp. 7-15, (2021)
[7] VERKHOLYAK O, DVOYNIKOVA A, KARPOV A., A bimodal approach for speech emotion recognition using audio and text, J Internet Serv Inf Secur, 11, pp. 80-96, (2021)
[8] RUSSELL J A., Core affect and the psychological construction of emotion, Psychological Review, 110, 1, pp. 145-172, (2003)
[9] VILALTA R, DRISSI Y., A perspective view and survey of meta-learning, Artificial Intelligence Review, 18, 2, pp. 77-95, (2002)
[10] FINN C, ABBEEL P, LEVINE S., Model-agnostic meta-learning for fast adaptation of deep networks, Proceedings of the 34th International Conference on Machine Learning, pp. 1126-1135, (2017)

← 1 2 →