A Large Scale Speech Sentiment Corpus

被引:0
|
作者
Chen, Eric Y. [1 ]
Lu, Zhiyun [2 ]
Xu, Hao [1 ]
Cao, Liangliang [1 ]
Zhang, Yu [1 ]
Fan, James [1 ]
机构
[1] Google Inc, New York, NY 10011 USA
[2] Univ Southern Calif, Los Angeles, CA 90007 USA
关键词
sentiment; switchboard; multimodal; speech;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present a multimodal corpus for sentiment analysis based on the existing Switchboard-1 Telephone Speech Corpus released by the Linguistic Data Consortium. This corpus extends the Switchboard-1 Telephone Speech Corpus by adding sentiment labels from 3 different human annotators for every transcript segment. Each sentiment label can be one of three options: positive, negative, and neutral. Annotators are recruited using Google Cloud's data labeling service and the labeling task was conducted over the internet. The corpus contains a total of 49500 labeled utterances covering 140 hours of audio. To the best of our knowledge, this is the largest multimodal Corpus for sentiment analysis that includes both speech and text features.
引用
收藏
页码:6549 / 6555
页数:7
相关论文
共 50 条
  • [41] SloParl - Slovenian Parliamentary speech and text corpus for large vocabulary continuous speech recognition
    Zgank, Andrej
    Rotovnik, Tomaz
    Grasic, Matej
    Kos, Marko
    Vlaj, Damjan
    Kacic, Zdravko
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 197 - 200
  • [42] AHUMADA: A large speech corpus in Spanish for speaker identification and verification
    Ortega-Garcia, J
    Gonzalez-Rodriguez, J
    Marrero-Aguiar, V
    Diaz-Gomez, JJ
    Garcia-Jimenez, R
    Lucena-Molina, J
    Sanchez-Molero, JAG
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 773 - 776
  • [43] AHUMADA: A large speech corpus in Spanish for speaker characterization and identification
    Ortega-Garcia, J
    Gonzalez-Rodriguez, J
    Marrero-Aguiar, V
    SPEECH COMMUNICATION, 2000, 31 (2-3) : 255 - 264
  • [44] ParCzech 3.0: A Large Czech Speech Corpus with Rich Metadata
    Kopp, Matyas
    Stankov, Vladislav
    Kruza, Jan Oldrich
    Stranak, Pavel
    Bojar, Ondrej
    TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 293 - 304
  • [45] Pitch distributions in a very large corpus of spontaneous Finnish speech
    Lennes, Mietta
    Toivola, Minnaleena
    INTERSPEECH 2023, 2023, : 4778 - 4782
  • [46] The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach
    Nerabie, Abdul Munem
    AlKhatib, Manar
    Mathew, Sujith Samuel
    El Barachi, May
    Oroumchian, Farhad
    12TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT) / THE 4TH INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40) / AFFILIATED WORKSHOPS, 2021, 184 : 148 - 155
  • [47] Low-resource cross-domain product review sentiment classification based on a CNN with an auxiliary large-scale corpus
    Wei X.
    Lin H.
    Yu Y.
    Yang L.
    Wei, Xiaocong (weixiaocong@dlufl.edu.cn), 1600, MDPI AG (10):
  • [48] A sentiment corpus for the cryptocurrency financial domain: the CryptoLin corpus
    Gadi, Manoel Fernando Alonso
    Sicilia, Miguel Angel
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [49] BeSt: The Belief and Sentiment Corpus
    Tracey, Jennifer
    Rambow, Owen
    Arrigo, Michael
    Cardie, Claire
    Dalton, Adam
    Dang, Hoa
    Diab, Mona
    Dorr, Bonnie
    Guthrie, Louise
    Markowska, Magdalena
    Muresan, Smaranda
    Prabhakaran, Vinodkumar
    Shaikh, Samira
    Strzalkowski, Tomek
    Wiebe, Janyce
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2460 - 2467
  • [50] A Large Scale Test Corpus for Semantic Table Search
    Leventidis, Aristotelis
    Christensen, Martin Pekar
    Lissandrini, Matteo
    Di Rocco, Laura
    Hose, Katja
    Miller, Renee J.
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1142 - 1151