A Review of the Mandarin-English Code-switching Corpus: SEAME

被引:0
|
作者
Lee, Grandee [1 ]
Ho, Thi-Nga [2 ]
Chng, Eng-Siong [2 ]
Li, Haizhou [1 ]
机构
[1] Natl Univ Singapore, Elect & Comp Engn Dept, Singapore 117583, Singapore
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
来源
2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP) | 2017年
关键词
Code-switching corpus; Mandarin English corpus; SEAME;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we report the development of the South East Asia Mandarin-English (SEAME) corpus, including 63 hours of transcribed spontaneous Mandarin English code-switching speech in its first release, and an update of additional 129 transcribed hours of speech. The corpus was developed for code-switching speech recognition research, such as LVCSR, language recognition, and language segmentation. It was made publicly available through LDC since 2015. The corpus was recorded under unscripted interview and conversation settings, therefore, consisting of spontaneous speech. This paper seeks to present a comprehensive statistics and analysis of the corpus after the update in term of its composition, speaker profile and code-switch characteristics. This paper will also review its suitability for various code-switch related researches and possible further developments.
引用
收藏
页码:210 / 213
页数:4
相关论文
共 50 条
  • [31] CanVEC - the Canberra Vietnamese-English Code-switching Natural Speech Corpus
    Li Nguyen
    Bryant, Christopher
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4121 - 4129
  • [32] Literature Review on the Application of Code-switching to College English Teaching
    尹慧昕
    海外英语, 2014, (13) : 271 - 272
  • [33] Code-Switching and College English Teaching
    Bo, Li
    PROCEEDINGS OF THE SIXTH NORTHEAST ASIA INTERNATIONAL SYMPOSIUM ON LANGUAGE, LITERATURE AND TRANSLATION, 2017, : 724 - 729
  • [34] Code-switching in medieval English drama
    Diller, HJ
    COMPARATIVE DRAMA, 1997, 31 (04) : 506 - 537
  • [35] CODE-SWITCHING - HINDI-ENGLISH
    VERMA, SK
    LINGUA, 1976, 38 (02) : 153 - 165
  • [36] A Turkish-German Code-Switching Corpus
    Cetinoglu, Ozlem
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4215 - 4220
  • [37] Hinglish: code-switching in Indian English
    Sailaja, Pingali
    ELT JOURNAL, 2011, 65 (04) : 473 - 480
  • [38] Code-switching in early English literature
    Schendl, Herbert
    LANGUAGE AND LITERATURE, 2015, 24 (03) : 233 - 248
  • [39] Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-switching Speech Recognition
    Zhang, Haobo
    Xu, Haihua
    Van Tung Pham
    Huang, Hao
    Chng, Eng Siong
    INTERSPEECH 2020, 2020, : 2392 - 2396
  • [40] Syntactic Configuration of Code-Switching between Indonesian and English: Another Perspective on Code-Switching Phenomena
    Sahib, Harlinah
    Hanafiah, Waode
    Aswad, Muhammad
    Yassi, Abdul Hakim
    Mashhadi, Farzad
    EDUCATION RESEARCH INTERNATIONAL, 2021, 2021