ShEMO: a large-scale validated database for Persian speech emotion detection

被引:35
|
作者
Nezami, Omid Mohamad [1 ]
Lou, Paria Jamshid [2 ]
Karami, Mansoureh [2 ]
机构
[1] Islamic Azad Univ, Bijar Branch, Bijar, Iran
[2] Sharif Univ Technol, Tehran, Iran
关键词
Emotional speech; Speech database; Emotion detection; Benchmark; Persian; RECOGNITION; MODEL; AGREEMENT; VALENCE; AROUSAL;
D O I
10.1007/s10579-018-9427-x
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3h and 25min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as substantial agreement. We also present benchmark results based on common classification methods in speech emotion detection task. According to the experiments, support vector machine achieves the best results for both gender-independent (58.2%) and gender-dependent models (female=59.4%, male=57.6%). The ShEMO will be available for academic purposes free of charge to provide a baseline for further research on Persian emotional speech.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 50 条
  • [31] A large-scale stream benthic diatom database
    Véronique Gosselain
    Michel Coste
    Stéphane Campeau
    Luc Ector
    Claude Fauville
    François Delmas
    Markus Knoflacher
    Magdalena Licursi
    Frédéric Rimet
    Juliette Tison
    Loïc Tudesque
    Jean-Pierre Descy
    Hydrobiologia, 2005, 542 : 151 - 163
  • [32] A SIMULATION TOOL FOR A LARGE-SCALE NOSQL DATABASE
    Ovando-Leon, Gabriel
    Veas-Castillo, Luis
    Marin, Mauricio
    Gil-Costa, Veronica
    2019 SPRING SIMULATION CONFERENCE (SPRINGSIM), 2019,
  • [33] Boosting face recognition on a large-scale database
    Lu, J
    Plataniotis, KN
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2002, : 109 - 112
  • [34] Exploring Transformers for Large-Scale Speech Recognition
    Lu, Liang
    Liu, Changliang
    Li, Jinyu
    Gong, Yifan
    INTERSPEECH 2020, 2020, : 5041 - 5045
  • [35] A Large-Scale Evaluation of Speech Foundation Models
    Yang, Shu-wen
    Chang, Heng-Jui
    Huang, Zili
    Liu, Andy T.
    Lai, Cheng-, I
    Wu, Haibin
    Shi, Jiatong
    Chang, Xuankai
    Tsai, Hsiang-Sheng
    Huang, Wen-Chin
    Feng, Tzu-hsun
    Chi, Po-Han
    Lin, Yist Y.
    Chuang, Yung-Sung
    Huang, Tzu-Hsien
    Tseng, Wei-Cheng
    Lakhotia, Kushal
    Li, Shang-Wen
    Mohamed, Abdelrahman
    Watanabe, Shinji
    Lee, Hung-yi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2884 - 2899
  • [36] Constructing a Large-Scale English-Persian Parallel Corpus
    Miangah, Tayebeh Mosavi
    META, 2009, 54 (01) : 181 - 188
  • [37] Functionally validated shRNA libraries for large-scale RNAi screens
    Sridhar, Vaishali
    Lai, Dan Yu
    Sinha, Nishi
    Premsrirut, Prem K.
    Fellmann, Christof
    CANCER RESEARCH, 2014, 74 (19)
  • [38] Large-scale neural networks and the lateralization of motivation and emotion
    Tops, Mattie
    Quirin, Markus
    Boksem, Maarten A. S.
    Koole, Sander L.
    INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2017, 119 : 41 - 49
  • [39] Open-Vocabulary Keyword Detection from Super-Large Scale Speech Database
    Kanda, Naoyuki
    Sagawa, Hirohiko
    Sumiyoshi, Takashi
    Obuchi, Yasunari
    2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 943 - 948
  • [40] Development of a speech emotion recognizer for large-scale child-centered audio recordings from a hospital environment
    Vaaras, Einari
    Ahlqvist-Bjorkroth, Sari
    Drossos, Konstantinos
    Lehtonen, Liisa
    Rasanen, Okko
    SPEECH COMMUNICATION, 2023, 148 : 9 - 22