Mandarin–English code-switching speech corpus in South-East Asia: SEAME

被引:0
|
作者
Dau-Cheng Lyu
Tien-Ping Tan
Eng-Siong Chng
Haizhou Li
机构
[1] Nanyang Technological University,Temasek Laboratories
[2] Nanyang Technological University,School of Computer Engineering
[3] Institute for Infocomm Research,School of Computer Sciences
[4] Universiti Sains Malaysia,undefined
[5] The University of New South Wales,undefined
来源
关键词
Code-switching speech; Spontaneous spoken corpus development; Mandarin–English; Speech recognition; Language recognition;
D O I
暂无
中图分类号
学科分类号
摘要
This paper introduces the South East Asia Mandarin–English corpus, a 63-h spontaneous Mandarin–English code-switching transcribed speech corpus suitable for LVCSR and language change detection/identification research. The corpus is recorded under unscripted interview and conversational settings from 157 Singaporean and Malaysian speakers who spoke a mixture of Mandarin and English within a single sentence. About 82 % of the transcribed utterances are intra-sentential code-switching speech and the corpus will be release by LDC in 2015. This paper presents an analysis of the code-switching statistics of the corpus, such as the duration of monolingual segments and the frequency of language turns in code-switch utterances. We also summarize the development effort, details such as the processing time for transcription, validation and language boundary labelling. Lastly, we present textual analyses of code-switch segments examining the word length of monolingual segments in code-switch utterances and the most common single word and two-word phrase of such segments.
引用
收藏
页码:581 / 600
页数:19
相关论文
共 50 条
  • [21] Code-switching in South Asian English CMC
    Shakir, Muhammad
    Deuber, Dagmar
    ENGLISH WORLD-WIDE, 2024, 45 (03) : 311 - 341
  • [22] A corpus investigation of the typology of code-switching between closely related languages: Data from Mandarin-Taiwanese code-switching
    Hsiao, Chien-Han
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2024,
  • [23] Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
    Guo, Pengcheng
    Xu, Haihua
    Xie, Lei
    Chng, Eng Siong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1928 - 1932
  • [24] Investigating Multi-task Learning for Automatic Speech Recognition with Code-switching between Mandarin and English
    Song, Xiao
    Zou, Yuexian
    Huang, Shilei
    Chen, Shaobin
    Liu, Yi
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 27 - 30
  • [25] Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech
    Gupta, Shashi Kant
    Hiray, Sushant
    Kukde, Prashant
    INTERSPEECH 2023, 2023, : 4114 - 4118
  • [26] Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition
    Fan, Zhiyun
    Dong, Linhao
    Shen, Chen
    Liang, Zhenlin
    Zhang, Jun
    Lu, Lu
    Ma, Zejun
    INTERSPEECH 2023, 2023, : 3322 - 3326
  • [27] Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching
    Li, Chia-Yu
    Ngoc Thang Vu
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 160 - 165
  • [28] JAPANESE-ENGLISH CODE-SWITCHING SPEECH DATA CONSTRUCTION
    Nakayama, Sahoko
    Kano, Takatomo
    Quoc Truong Do
    Sakti, Sakriani
    Nakamura, Satoshi
    2018 ORIENTAL COCOSDA - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2018, : 67 - 71
  • [29] Acoustic modeling for Thai-English code-switching speech
    Chunwijitra, Vataya
    Thatphithakkul, Sumonmas
    Chootrakool, Patcharika
    Kasuriya, Sawit
    PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020), 2020, : 94 - 99
  • [30] Code-switching in reported speech
    Leisiö, L
    SELECTED PAPERS FROM THE 6TH INTERNATIONAL PRAGMATICS CONFERENCE, VOL 2: PRAGMATICS IN 1998, 1999, : 349 - 362