Contribution of modulation spectral features for cross-lingual speech emotion recognition under noisy reverberant conditions

被引:0
|
作者
Guo, Taiyang [1 ]
Li, Sixia [1 ]
Kidani, Shunsuke [1 ]
Okada, Shogo [1 ]
Unoki, Masashi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, 1-1 Asahidai, Nomi, Ishikawa 9231292, Japan
基金
日本学术振兴会;
关键词
D O I
10.1109/APSIPAASC58517.2023.10317449
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Handling multiple languages under noisy reverberant conditions has become increasingly important for speech emotion recognition (SER). Previous studies found that modulation spectral features (MSFs) are robust to noisy reverberant conditions for SER. However, they mainly focused on specific languages; the universality of MSFs among languages is still unclear. To address this issue, we compared MSFs, hand-crafted features, Wav2Vec2.0-based features, MSFs+hand-crafted features for SER on four languages under 12 noisy reverberant conditions. Intra-lingual results showed that MSFs+hand-crafted features performed best on most conditions of all languages. Inter-lingual results showed that MSFs performed best on most conditions of test languages except training on a tonal language and testing on others. The results demonstrate that MSFs are robust to multilingual SER under noisy reverberant conditions and suggest that MSFs are potentially language-independent features for nontonal languages.
引用
收藏
页码:2221 / 2227
页数:7
相关论文
共 50 条
  • [41] Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages
    Van Hai Do
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (02): : 285 - 295
  • [42] Hybrid Spectral Features for Speech Emotion Recognition
    Shah, Firoz A.
    Anto, Babu P.
    2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,
  • [43] Cross-Lingual Features for Alzheimer's Dementia Detection from Speech
    Melistas, Thomas
    Kapelonis, Lefteris
    Antoniou, Nikos
    Mitseas, Petros
    Sgouropoulos, Dimitris
    Giannakopoulos, Theodoros
    Katsamanis, Athanasios
    Narayanan, Shrikanth
    INTERSPEECH 2023, 2023, : 3008 - 3012
  • [44] Cross-Lingual Language Modeling for Low-Resource Speech Recognition
    Xu, Ping
    Fung, Pascale
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1134 - 1144
  • [45] Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition
    Hou, Wenxin
    Zhu, Han
    Wang, Yidong
    Wang, Jindong
    Qin, Tao
    Xu, Renju
    Shinozaki, Takahiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 317 - 329
  • [46] Cost-efficient cross-lingual adaptation of a speech recognition system
    Callejas, Zoraida
    Nouza, Jan
    Cerva, Petr
    López-Cózar, Ramón
    Advances in Intelligent and Soft Computing, 2009, 57 : 331 - 338
  • [47] CROSS-LINGUAL PHONEME MAPPING FOR LANGUAGE ROBUST CONTEXTUAL SPEECH RECOGNITION
    Patel, Ami
    Li, David
    Cho, Eunjoon
    Aleksic, Petar
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5924 - 5928
  • [48] Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition
    Hernandez, Abner
    Perez-Toro, Paula Andrea
    Noeth, Elmar
    Orozco-Arroyave, Juan Rafael
    Maier, Andreas
    Yang, Seung Hee
    INTERSPEECH 2022, 2022, : 51 - 55
  • [49] Modulation frequency features for phoneme recognition in noisy speech
    Ganapathy, Sriram
    Thomas, Samuel
    Hermansky, Hynek
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (01): : EL8 - EL12
  • [50] CROSS-LINGUAL CONTEXT SHARING AND PARAMETER-TYING FOR MULTI-LINGUAL SPEECH RECOGNITION
    Mohan, Aanchan
    Rose, Richard
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 126 - 131