Prosodic feature normalization for emotion recognition by using synthesized speech

被引:1
|
作者
Suzuki, Motoyuki [1 ]
Nakagawa, Shohei [1 ]
Kita, Kenji [1 ]
机构
[1] Univ Tokushima, Inst Sci & Technol, Tokushima 7708506, Japan
关键词
Emotion recognition of speech; prosodic feature normalization; synthesized speech;
D O I
10.3233/978-1-61499-105-2-306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition from speech signals is one of the most important technologies for natural conversation between humans and robots. Most emotion recognizers extract prosodic features from an input speech in order to use emotion recognition. However, prosodic features changes drastically depending on the uttered text. In order to normalize the differences of prosodic features related to an uttered text, we used a synthesized speech signal. Most speech synthesizers output speech signals with a "neutral" emotion. After extracting prosodic features from an input speech, it is normalized by using prosodic features extracted from the synthesized speech. We propose two types of normalization, frame-level normalization and vector-level normalization. The experimental results showed that the frame-level normalization is effective for two important emotional dimensions. The average normalized difference was decreased by 0.41% (pleasantness) and 1.14% (arousal).
引用
收藏
页码:306 / 313
页数:8
相关论文
共 50 条
  • [41] Speech emotion recognition using MFCC-based entropy feature
    Siba Prasad Mishra
    Pankaj Warule
    Suman Deb
    Signal, Image and Video Processing, 2024, 18 : 153 - 161
  • [42] Speech emotion recognition using MFCC-based entropy feature
    Mishra, Siba Prasad
    Warule, Pankaj
    Deb, Suman
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 153 - 161
  • [43] Speech Emotion Recognition using Feature Selection with Adaptive Structure Learning
    Rayaluru, Akshay
    Bandela, Surekha Reddy
    Kumar, T. Kishore
    2019 IEEE INTERNATIONAL SYMPOSIUM ON SMART ELECTRONIC SYSTEMS (ISES 2019), 2019, : 233 - 236
  • [44] Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders
    Pulatov, Ilkhomjon
    Oteniyazov, Rashid
    Makhmudov, Fazliddin
    Cho, Young-Im
    SENSORS, 2023, 23 (14)
  • [45] Speech emotion recognition using semi-NMF feature optimization
    Bandela, Surekha Reddy
    Kumar, T. Kishore
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (05) : 3741 - 3757
  • [46] Speech emotion recognition based on hierarchical attributes using feature nets
    Zhao, Huijuan
    Ye, Ning
    Wang, Ruchuan
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2020, 35 (03) : 354 - 364
  • [47] Emotion Recognition Using Multi-parameter Speech Feature Classification
    Poorna, S. S.
    Jeevitha, C. Y.
    Nair, Shyama Jayan
    Santhosh, Sini
    Nair, G. J.
    2015 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, AND SYSTEMS (ICCCS), 2015, : 217 - 222
  • [48] A Robust Feature Normalization Algorithm for Automatic Speech Recognition
    Lei, Jianjun
    Yang, Zhen
    Wang, Jian
    FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 473 - +
  • [49] A Hybrid Speech Emotion Recognition System Based on Spectral and Prosodic Features
    Zhou, Yu
    Li, Junfeng
    Sun, Yanqing
    Zhang, Jianping
    Yan, Yonghong
    Akagi, Masato
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (10) : 2813 - 2821
  • [50] PERFORMANCE ANALYSIS OF SPECTRAL AND PROSODIC FEATURES AND THEIR FUSION FOR EMOTION RECOGNITION IN SPEECH
    Gaurav, Manish
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 313 - 316