Prosodic feature normalization for emotion recognition by using synthesized speech

被引:1
|
作者
Suzuki, Motoyuki [1 ]
Nakagawa, Shohei [1 ]
Kita, Kenji [1 ]
机构
[1] Univ Tokushima, Inst Sci & Technol, Tokushima 7708506, Japan
关键词
Emotion recognition of speech; prosodic feature normalization; synthesized speech;
D O I
10.3233/978-1-61499-105-2-306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition from speech signals is one of the most important technologies for natural conversation between humans and robots. Most emotion recognizers extract prosodic features from an input speech in order to use emotion recognition. However, prosodic features changes drastically depending on the uttered text. In order to normalize the differences of prosodic features related to an uttered text, we used a synthesized speech signal. Most speech synthesizers output speech signals with a "neutral" emotion. After extracting prosodic features from an input speech, it is normalized by using prosodic features extracted from the synthesized speech. We propose two types of normalization, frame-level normalization and vector-level normalization. The experimental results showed that the frame-level normalization is effective for two important emotional dimensions. The average normalized difference was decreased by 0.41% (pleasantness) and 1.14% (arousal).
引用
收藏
页码:306 / 313
页数:8
相关论文
共 50 条
  • [21] Speech emotion recognition based on prosodic segment level features
    Han, Wenjing
    Li, Haifeng
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2009, 49 (SUPPL. 1): : 1363 - 1368
  • [22] Using Prosodic Phrase-Based VQVAE on Audio ALBERT for Speech Emotion Recognition
    Hsu, Jia-Hao
    Wu, Chung-Hsien
    Yang, Tsung-Hsien
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 415 - 419
  • [23] Hierarchical emotion recognition from speech using source, power spectral and prosodic features
    Arijul Haque
    K. Sreenivasa Rao
    Multimedia Tools and Applications, 2024, 83 : 19629 - 19661
  • [24] Temporal structure normalization of speech feature for robust speech recognition
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (07) : 500 - 503
  • [25] LEARNING WITH SYNTHESIZED SPEECH FOR AUTOMATIC EMOTION RECOGNITION
    Schuller, Bjoern
    Burkhardt, Felix
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5150 - 5153
  • [26] Hierarchical emotion recognition from speech using source, power spectral and prosodic features
    Haque, Arijul
    Rao, K. Sreenivasa
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19629 - 19661
  • [27] Cepstral Feature Normalization Methods Using Pole Filtering and Scale Normalization for Robust Speech Recognition
    Choi, Bo Kyeong
    Ban, Sung Min
    Kim, Hyung Soon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2015, 34 (04): : 316 - 320
  • [28] Emotion recognition using semi-supervised feature selection with speaker normalization
    Sun Y.
    Wen G.
    International Journal of Speech Technology, 2015, 18 (3) : 317 - 331
  • [29] Speech emotion recognition with unsupervised feature learning
    Zheng-wei HUANG
    Wen-tao XUE
    Qi-rong MAO
    FrontiersofInformationTechnology&ElectronicEngineering, 2015, 16 (05) : 358 - 366
  • [30] Evolutionary feature generation in speech emotion recognition
    Schuller, Bjorn
    Reiter, Stephan
    Rigoll, Gerhard
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 5 - +