Music Theory-Inspired Acoustic Representation for Speech Emotion Recognition

被引:5
|
作者
Li, Xingfeng [1 ]
Shi, Xiaohan [2 ]
Hu, Desheng [3 ]
Li, Yongwei [4 ]
Zhang, Qingchen [1 ]
Wang, Zhengxia [5 ]
Unoki, Masashi [6 ]
Akagi, Masato [6 ]
机构
[1] Hainan Univ, Grad Sch Comp Sci & Technol, Haikou 570288, Peoples R China
[2] Nagoya Univ, Sch Informat Sci, Nagoya 4648601, Japan
[3] Taiyuan Univ Technol, Coll Informat & Comp, Taiyuan 030024, Peoples R China
[4] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[5] Hainan Univ, Sch Comp Sci & Technol, Haikou 570288, Peoples R China
[6] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi 9231292, Japan
基金
中国国家自然科学基金;
关键词
Affective computing; speech emotion recognition; acoustic representation; music theory and speech analysis; PERCEPTION; EXPRESSION; PATTERNS; FEATURES; PITCH; PERSPECTIVE; MODALITIES; KNOWLEDGE; INTERVALS; COGNITION;
D O I
10.1109/TASLP.2023.3289312
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This research presents a music theory-inspired acoustic representation (hereafter, MTAR) to address improved speech emotion recognition. The recognition of emotion in speech and music is developed in parallel, yet a relatively limited understanding of MTAR for interpreting speech emotions is involved. In the present study, we use music theory to study representative acoustics associated with emotion in speech from vocal emotion expressions and auditory emotion perception domains. In experiments assessing the role and effectiveness of the proposed representation in classifying discrete emotion categories and predicting continuous emotion dimensions, it shows promising performance compared with extensively used features for emotion recognition based on the spectrogram, Mel-spectrogram, Mel-frequency cepstral coefficients, VGGish, and the large baseline feature sets of the INTERSPEECH challenges. This proposal opens up a novel research avenue in developing a computational acoustic representation of speech emotion via music theory.
引用
收藏
页码:2534 / 2547
页数:14
相关论文
共 50 条
  • [21] Survey of Deep Representation Learning for Speech Emotion Recognition
    Latif, Siddique
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Qadir, Junaid
    Schuller, Bjorn
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1634 - 1654
  • [22] IMPROVING SPEECH EMOTION RECOGNITION WITH UNSUPERVISED REPRESENTATION LEARNING ON UNLABELED SPEECH
    Neumann, Michael
    Ngoc Thang Vu
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7390 - 7394
  • [23] Popular music representation: chorus detection & emotion recognition
    Chia-Hung Yeh
    Wen-Yu Tseng
    Chia-Yen Chen
    Yu-Dun Lin
    Yi-Ren Tsai
    Hsuan-I Bi
    Yu-Ching Lin
    Ho-Yi Lin
    Multimedia Tools and Applications, 2014, 73 : 2103 - 2128
  • [24] Popular music representation: chorus detection & emotion recognition
    Yeh, Chia-Hung
    Tseng, Wen-Yu
    Chen, Chia-Yen
    Lin, Yu-Dun
    Tsai, Yi-Ren
    Bi, Hsuan-I
    Lin, Yu-Ching
    Lin, Ho-Yi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 73 (03) : 2103 - 2128
  • [25] Speech Emotion Recognition Based on Acoustic Segment Model
    Zheng, Siyuan
    Du, Jun
    Zhou, Hengshun
    Bai, Xue
    Lee, Chin-Hui
    Li, Shipeng
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [26] Emotion recognition and acoustic analysis from speech signal
    Park, CH
    Sim, KB
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2594 - 2598
  • [27] Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech
    Leem, Seong-Gyun
    Fulford, Daniel
    Onnela, Jukka-Pekka
    Gard, David
    Busso, Carlos
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 917 - 929
  • [28] EMOCEPTION: AN INCEPTION INSPIRED EFFICIENT SPEECH EMOTION RECOGNITION NETWORK
    Singh, Chirag
    Kumar, Abhay
    Nagar, Ajay
    Tripathi, Suraj
    Yenigalla, Promod
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 787 - 791
  • [29] Acoustic Features for Music Emotion Recognition and System Building
    Soruss, Kanawat
    Choksuriwong, Anant
    Karnjanadecha, Montri
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2017), 2017, : 413 - 417
  • [30] Robust emotion recognition in noisy speech via sparse representation
    Zhao, Xiaoming
    Zhang, Shiqing
    Lei, Bicheng
    NEURAL COMPUTING & APPLICATIONS, 2014, 24 (7-8): : 1539 - 1553