Probing Speech Emotion Recognition Transformers for Linguistic Knowledge

被引:6
|
作者
Triantafyllopoulos, Andreas [1 ]
Wagner, Johannes [2 ]
Wierstorf, Hagen [2 ]
Schmitt, Maximilian [2 ]
Reichel, Uwe [2 ]
Eyben, Florian [2 ]
Burkhardt, Felix [2 ]
Schuller, Bjoern W. [1 ,2 ,3 ]
机构
[1] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany
[2] audEERING GmbH, Gilching, Germany
[3] Imperial Coll, GLAM Grp Language Audio & Mus, London, England
来源
关键词
speech emotion recognition; transformers;
D O I
10.21437/Interspeech.2022-10371
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets. These models are typically pre-trained in self-supervised manner with the goal to improve automatic speech recognition performance - and thus, to understand linguistic information. In this work, we investigate the extent in which this information is exploited during SER fine-tuning. Using a reproducible methodology based on open-source tools, we synthesise prosodically neutral speech utterances while varying the sentiment of the text. Valence predictions of the transformer model are very reactive to positive and negative sentiment content, as well as negations, but not to intensifiers or reducers, while none of those linguistic features impact arousal or dominance. These findings show that transformers can successfully leverage linguistic information to improve their valence predictions, and that linguistic analysis should be included in their testing.
引用
收藏
页码:146 / 150
页数:5
相关论文
共 50 条
  • [21] Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions
    Nagase, Ryotaro
    Fukumori, Takahiro
    Yamashita, Yoichi
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 725 - 730
  • [22] MaxMViT-MLP: Multiaxis and Multiscale Vision Transformers Fusion Network for Speech Emotion Recognition
    Ong, Kah Liang
    Lee, Chin Poo
    Lim, Heng Siong
    Lim, Kian Ming
    Alqahtani, Ali
    IEEE ACCESS, 2024, 12 : 18237 - 18250
  • [23] MelTrans: Mel-Spectrogram Relationship-Learning for Speech Emotion Recognition via Transformers
    Li, Hui
    Li, Jiawen
    Liu, Hai
    Liu, Tingting
    Chen, Qiang
    You, Xinge
    SENSORS, 2024, 24 (17)
  • [24] Multiresolution and Multimodal Speech Recognition with Transformers
    Paraskevopoulos, Georgios
    Parthasarathy, Srinivas
    Khare, Aparna
    Sundaram, Shiva
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2381 - 2387
  • [25] Speech emotion recognition based on emotion perception
    Gang Liu
    Shifang Cai
    Ce Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [26] Speech emotion recognition based on emotion perception
    Liu, Gang
    Cai, Shifang
    Wang, Ce
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [27] Autoencoder With Emotion Embedding for Speech Emotion Recognition
    Zhang, Chenghao
    Xue, Lei
    IEEE ACCESS, 2021, 9 : 51231 - 51241
  • [28] Autoencoder with emotion embedding for speech emotion recognition
    Zhang, Chenghao
    Xue, Lei
    IEEE Access, 2021, 9 : 51231 - 51241
  • [29] English speech emotion recognition method based on speech recognition
    Liu, Man
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398
  • [30] English speech emotion recognition method based on speech recognition
    Man Liu
    International Journal of Speech Technology, 2022, 25 : 391 - 398