Prosodic feature normalization for emotion recognition by using synthesized speech

被引:1
|
作者
Suzuki, Motoyuki [1 ]
Nakagawa, Shohei [1 ]
Kita, Kenji [1 ]
机构
[1] Univ Tokushima, Inst Sci & Technol, Tokushima 7708506, Japan
关键词
Emotion recognition of speech; prosodic feature normalization; synthesized speech;
D O I
10.3233/978-1-61499-105-2-306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition from speech signals is one of the most important technologies for natural conversation between humans and robots. Most emotion recognizers extract prosodic features from an input speech in order to use emotion recognition. However, prosodic features changes drastically depending on the uttered text. In order to normalize the differences of prosodic features related to an uttered text, we used a synthesized speech signal. Most speech synthesizers output speech signals with a "neutral" emotion. After extracting prosodic features from an input speech, it is normalized by using prosodic features extracted from the synthesized speech. We propose two types of normalization, frame-level normalization and vector-level normalization. The experimental results showed that the frame-level normalization is effective for two important emotional dimensions. The average normalized difference was decreased by 0.41% (pleasantness) and 1.14% (arousal).
引用
收藏
页码:306 / 313
页数:8
相关论文
共 50 条
  • [31] Speech emotion recognition with unsupervised feature learning
    Huang, Zheng-wei
    Xue, Wen-tao
    Mao, Qi-rong
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2015, 16 (05) : 358 - 366
  • [32] Composite Feature Extraction for Speech Emotion Recognition
    Fu, Yangzhi
    Yuan, Xiaochen
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2020), 2020, : 72 - 77
  • [33] Speech emotion recognition with unsupervised feature learning
    Zheng-wei Huang
    Wen-tao Xue
    Qi-rong Mao
    Frontiers of Information Technology & Electronic Engineering, 2015, 16 : 358 - 366
  • [34] Speech Emotion Recognition with Discriminative Feature Learning
    Zhou, Huan
    Liu, Kai
    INTERSPEECH 2020, 2020, : 4094 - 4097
  • [35] Feature selection for emotion recognition of mandarin speech
    College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
    不详
    Zhejiang Daxue Xuebao (Gongxue Ban), 2007, 11 (1816-1822):
  • [36] Discriminative Feature Learning for Speech Emotion Recognition
    Zhang, Yuying
    Zou, Yuexian
    Peng, Junyi
    Luo, Danqing
    Huang, Dongyan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 198 - 210
  • [37] EESpectrum Feature Representations for Speech Emotion Recognition
    Zhao, Ziping
    Zhao, Yiqin
    Bao, Zhongtian
    Wang, Haishuai
    Zhang, Zixing
    Li, Chao
    PROCEEDINGS OF THE JOINT WORKSHOP OF THE 4TH WORKSHOP ON AFFECTIVE SOCIAL MULTIMEDIA COMPUTING AND FIRST MULTI-MODAL AFFECTIVE COMPUTING OF LARGE-SCALE MULTIMEDIA DATA (ASMMC-MMAC'18), 2018, : 27 - 33
  • [38] Speech Emotion Recognition Based on Feature Fusion
    Shen, Qi
    Chen, Guanggen
    Chang, Lin
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, MACHINERY AND ENERGY ENGINEERING (MSMEE 2017), 2017, 123 : 1071 - 1074
  • [39] Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition
    Masoud Geravanchizadeh
    Elnaz Forouhandeh
    Meysam Bashirpour
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [40] Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition
    Geravanchizadeh, Masoud
    Forouhandeh, Elnaz
    Bashirpour, Meysam
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)