Probing Speech Emotion Recognition Transformers for Linguistic Knowledge

被引:6
|
作者
Triantafyllopoulos, Andreas [1 ]
Wagner, Johannes [2 ]
Wierstorf, Hagen [2 ]
Schmitt, Maximilian [2 ]
Reichel, Uwe [2 ]
Eyben, Florian [2 ]
Burkhardt, Felix [2 ]
Schuller, Bjoern W. [1 ,2 ,3 ]
机构
[1] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany
[2] audEERING GmbH, Gilching, Germany
[3] Imperial Coll, GLAM Grp Language Audio & Mus, London, England
来源
关键词
speech emotion recognition; transformers;
D O I
10.21437/Interspeech.2022-10371
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets. These models are typically pre-trained in self-supervised manner with the goal to improve automatic speech recognition performance - and thus, to understand linguistic information. In this work, we investigate the extent in which this information is exploited during SER fine-tuning. Using a reproducible methodology based on open-source tools, we synthesise prosodically neutral speech utterances while varying the sentiment of the text. Valence predictions of the transformer model are very reactive to positive and negative sentiment content, as well as negations, but not to intensifiers or reducers, while none of those linguistic features impact arousal or dominance. These findings show that transformers can successfully leverage linguistic information to improve their valence predictions, and that linguistic analysis should be included in their testing.
引用
收藏
页码:146 / 150
页数:5
相关论文
共 50 条
  • [1] Linguistic knowledge and empirical methods in speech recognition
    Stolcke, A
    AI MAGAZINE, 1997, 18 (04) : 25 - 31
  • [2] CUBIC KNOWLEDGE DISTILLATION FOR SPEECH EMOTION RECOGNITION
    Lou, Zhibo
    Otake, Shinta
    Li, Zhengxiao
    Kawakami, Rei
    Inoue, Nakamasa
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5705 - 5709
  • [3] Multistage linguistic conditioning of convolutional layers for speech emotion recognition
    Triantafyllopoulos, Andreas
    Reichel, Uwe
    Liu, Shuo
    Huber, Stephan
    Eyben, Florian
    Schuller, Bjoern W.
    FRONTIERS IN COMPUTER SCIENCE, 2023, 5
  • [4] Emotion Recognition from Speech using Prosodic and Linguistic Features
    Pervaiz, Mahwish
    Khan, Tamim Ahmed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (08) : 84 - 90
  • [5] Personalised Emotion Recognition Utilising Speech Signal and Linguistic Cues
    Ramya, H. R.
    Bhatt, Mahabaleswara Ram
    2019 11TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2019, : 856 - 860
  • [6] ScSer: Supervised Contrastive Learning for Speech Emotion Recognition using Transformers
    Alaparthi, Varun Sai
    Pasam, Tejeswara Reddy
    Inagandla, Deepak Abhiram
    Prakash, Jay
    Singh, Pramod Kumar
    2022 15TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2022,
  • [7] Applying Generative Adversarial Networks and Vision Transformers in Speech Emotion Recognition
    Heracleous, Panikos
    Fukayama, Satoru
    Ogata, Jun
    Mohammad, Yasser
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13519 LNCS : 67 - 75
  • [8] Spontaneous Speech Emotion Recognition using Prior Knowledge
    Chakraborty, Rupayan
    Pandharipande, Meghna
    Kopparapu, Sunil Kumar
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2866 - 2871
  • [9] Emotion recognition by speech signal characteristics (linguistic, clinical, informative aspects)
    Prokofyeva, L. P.
    Plastun, I. L.
    Filippova, N., V
    Matveeva, L. Yu
    Plastun, Na S.
    SIBIRSKII FILOLOGICHESKII ZHURNAL, 2021, (02): : 325 - 336
  • [10] Speaker and gender dependencies in within/cross linguistic Speech Emotion Recognition
    Chakhtouna A.
    Sekkate S.
    Adib A.
    International Journal of Speech Technology, 2023, 26 (03) : 609 - 625