You're Not You When You're Angry: Robust Emotion Features Emerge by Recognizing Speakers

被引:5
|
作者
Aldeneh, Zakaria [1 ]
Provost, Emily Mower [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
Feature extraction; Emotion recognition; Speech recognition; Acoustics; Task analysis; Neural networks; Speaker recognition; speaker recognition; speaker embeddings; speaker representations; transfer learning; RECOGNITION; SPEECH; SIGNALS;
D O I
10.1109/TAFFC.2021.3086050
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The robustness of an acoustic emotion recognition system hinges on first having access to features that represent an acoustic input signal. These representations should abstract extraneous low-level variations present in acoustic signals and only capture speaker characteristics relevant for emotion recognition. Previous research has demonstrated that, in other classification tasks, when large labeled datasets are available, neural networks trained on these data learn to extract robust features from the input signal. However, the datasets used for developing emotion recognition systems remain significantly smaller than those used for developing other speech systems. Thus, acoustic emotion recognition systems remain in need of robust feature representations. In this article, we study the utility of speaker embeddings, representations extracted from a trained speaker recognition network, as robust features for detecting emotions. We first study the relationship between emotions and speaker embeddings and demonstrate how speaker embeddings highlight the differences that exist between neutral speech and emotionally expressive speech. We quantify the modulations that variations in emotional expression incur on speaker embeddings and show how these modulations are greater than those incurred from lexical variations in an utterance. Finally, we demonstrate how speaker embeddings can be used as a replacement for traditional low-level acoustic features for emotion recognition.
引用
收藏
页码:1351 / 1362
页数:12
相关论文
共 50 条