Multilingual Speech Emotion Recognition System based on a Three-layer Model

被引:9
|
作者
Li, Xingfeng [1 ]
Akagi, Masato [1 ]
机构
[1] Japan Adv Inst Sci & Technol, 1-1 Asahidai, Nomi, Ishikawa 9231292, Japan
基金
日本学术振兴会;
关键词
emotion recognition in speech; three-layer model; emotion dimension;
D O I
10.21437/Interspeech.2016-645
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech Emotion Recognition (SER) systems currently are focusing on classifying emotions on each single language. Since optimal acoustic sets are strongly language dependent, to achieve a generalized SER system working for multiple languages, issues of selection of common features and retraining are still challenging. In this paper, we therefore present a SER system in a multilingual scenario from perspective of human perceptual processing. The goal is twofold. Firstly, to predict multilingual emotion dimensions accurately such as human annotations. To this end, a three layered model consist of acoustic features, semantic primitives, emotion dimensions, along with Fuzzy Inference System (FIS) were studied. Secondly, by knowledge of human perception of emotion among languages in dimensional space, we adopt direction and distance as common features to detect multilingual emotions. Results of estimation performance of emotion dimensions comparable to human evaluation is furnished, and classification rates that are close to monolingual SER system performed are achieved.
引用
收藏
页码:3608 / 3612
页数:5
相关论文
共 50 条
  • [1] Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model
    Li, Xingfeng
    Akagi, Masato
    SPEECH COMMUNICATION, 2019, 110 : 1 - 12
  • [2] Cross-lingual Speech Emotion Recognition System Based on a Three-Layer Model for Human Perception
    Elbarougy, Reda
    Akagi, Masato
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [3] A Three-Layer Emotion Perception Model for Valence and Arousal-Based Detection from Multilingual Speech
    Li, Xingfeng
    Akagi, Masato
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3643 - 3647
  • [4] Maximal Information Coefficient and Predominant Correlation-Based Feature Selection Toward A Three-Layer Model for Speech Emotion Recognition
    Li, Xingfeng
    Akagi, Masato
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1428 - 1434
  • [5] Improving speech emotion dimensions estimation using a three-layer model of human perception
    Elbarougy, Reda
    Akagi, Masato
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2014, 35 (02) : 86 - 98
  • [6] A Multilingual Framework Based on Pre-training Model for Speech Emotion Recognition
    Zhang, Zhaohang
    Zhang, Xiaohui
    Guo, Min
    Zhang, Wei-Qiang
    Li, Ke
    Huang, Yukai
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 750 - 755
  • [7] Emotion Recognition from Speech Signal in Multilingual
    Albu, Corina
    Lupu, Eugen
    Arsinte, Radu
    6TH INTERNATIONAL CONFERENCE ON ADVANCEMENTS OF MEDICINE AND HEALTH CARE THROUGH TECHNOLOGY, MEDITECH 2018, 2019, 71 : 157 - 161
  • [8] Separability and recognition of emotion states in multilingual speech
    Jiang, XQ
    Tian, L
    Han, M
    2005 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, VOLS 1 AND 2, PROCEEDINGS: VOL 1: COMMUNICATION THEORY AND SYSTEMS, 2005, : 861 - 864
  • [9] Speech Emotion Recognition System Based on a Dimensional Approach Using a Three-Layered Model
    Elbarougy, Reda
    Akagi, Masato
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [10] Integrating Language and Emotion Features for Multilingual Speech Emotion Recognition
    Heracleous, Panikos
    Mohammad, Yasser
    Yoneyama, Akio
    HUMAN-COMPUTER INTERACTION. MULTIMODAL AND NATURAL INTERACTION, HCI 2020, PT II, 2020, 12182 : 187 - 196