MULTIMODAL TRANSFORMER WITH LEARNABLE FRONTEND AND SELF ATTENTION FOR EMOTION RECOGNITION

被引：11

作者：

Dutta, Soumya

Ganapathy, Sriram

机构：

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Multi-modal emotion recognition; Transformer networks; self-attention models; learnable front-end; SENTIMENT ANALYSIS; FUSION;

D O I：

10.1109/ICASSP43922.2022.9747723

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we propose a novel approach for multi-modal emotion recognition from conversations using speech and text. The audio representations are learned jointly with a learnable audio front-end (LEAF) model feeding to a CNN based classifier. The text representations are derived from pre-trained bidirectional encoder representations from transformer (BERT) along with a gated recurrent network (GRU). Both the textual and audio representations are separately processed using a bidirectional GRU network with self-attention. Further, the multi-modal information extraction is achieved using a transformer that is input with the textual and audio embeddings at the utterance level. The experiments are performed on the IEMO-CAP database, where we show that the proposed framework improves over the current state-of-the-art results under all the common test settings. This is primarily due to the improved emotion recognition performance achieved in the audio domain. Further, we also show that the model is more robust to textual errors caused by an automatic speech recognition (ASR) system.

引用

页码：6917 / 6921

页数：5

共 50 条

[1] HyFusER: Hybrid Multimodal Transformer for Emotion Recognition Using Dual Cross Modal Attention
Yi, Moung-Ho
Kwak, Keun-Chang
Shin, Ju-Hyun
APPLIED SCIENCES-BASEL, 2025, 15 (03):
[2] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
Huang, Jian
Tao, Jianhua
Liu, Bin
Lian, Zheng
Niu, Mingyue
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
[3] Multimodal Transformer Fusion for Emotion Recognition: A Survey
Belaref, Amdjed
Seguier, Renaud
2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 107 - 113
[4] Joint Multimodal Transformer for Emotion Recognition in the Wild
Waligora, Paul
Aslam, Muhammad Haseeb
Zeeshan, Muhammad Osama
Belharbi, Soufiane
Koerich, Alessandro Lameiras
Pedersoli, Marco
Bacon, Simon
Granger, Eric
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 4625 - 4635
[5] Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion
Siriwardhana, Shamane
Kaluarachchi, Tharindu
Billinghurst, Mark
Nanayakkara, Suranga
IEEE ACCESS, 2020, 8 (08): : 176274 - 176285
[6] Self-supervised representation learning using multimodal Transformer for emotion recognition
Goetz, Theresa
Arora, Pulkit
Erick, F. X.
Holzer, Nina
Sawant, Shrutika
PROCEEDINGS OF THE 8TH INTERNATIONAL WORKSHOP ON SENSOR-BASED ACTIVITY RECOGNITION AND ARTIFICIAL INTELLIGENCE, IWOAR 2023, 2023,
[7] Correlated Attention Networks for Multimodal Emotion Recognition
Qiu, Jie-Lin
Li, Xiao-Yu
Hu, Kai
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2656 - 2660
[8] MULTIMODAL CROSS- AND SELF-ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION
Sun, Licai
Liu, Bin
Tao, Jianhua
Lian, Zheng
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4275 - 4279
[9] Focus-attention-enhanced Crossmodal Transformer with Metric Learning for Multimodal Speech Emotion Recognition
Kim, Keulbit
Cho, Namhyun
INTERSPEECH 2023, 2023, : 2673 - 2677
[10] Noise-Resistant Multimodal Transformer for Emotion Recognition
Liu, Yuanyuan
Zhang, Haoyu
Zhan, Yibing
Chen, Zijing
Yin, Guanghao
Wei, Lin
Chen, Zhe
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (05) : 3020 - 3040

← 1 2 3 4 5 →