Mandarin–English code-switching speech corpus in South-East Asia: SEAME

被引：0

作者：

Dau-Cheng Lyu

Tien-Ping Tan

Eng-Siong Chng

Haizhou Li

机构：

[1] Nanyang Technological University,Temasek Laboratories

[2] Nanyang Technological University,School of Computer Engineering

[3] Institute for Infocomm Research,School of Computer Sciences

[4] Universiti Sains Malaysia,undefined

[5] The University of New South Wales,undefined

来源：

Language Resources and Evaluation | 2015年 / 49卷

关键词：

Code-switching speech; Spontaneous spoken corpus development; Mandarin–English; Speech recognition; Language recognition;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper introduces the South East Asia Mandarin–English corpus, a 63-h spontaneous Mandarin–English code-switching transcribed speech corpus suitable for LVCSR and language change detection/identification research. The corpus is recorded under unscripted interview and conversational settings from 157 Singaporean and Malaysian speakers who spoke a mixture of Mandarin and English within a single sentence. About 82 % of the transcribed utterances are intra-sentential code-switching speech and the corpus will be release by LDC in 2015. This paper presents an analysis of the code-switching statistics of the corpus, such as the duration of monolingual segments and the frequency of language turns in code-switch utterances. We also summarize the development effort, details such as the processing time for transcription, validation and language boundary labelling. Lastly, we present textual analyses of code-switch segments examining the word length of monolingual segments in code-switch utterances and the most common single word and two-word phrase of such segments.

引用

页码：581 / 600

页数：19

共 50 条

[1] Mandarin-English code-switching speech corpus in South-East Asia: SEAME
Lyu, Dau-Cheng
Tan, Tien-Ping
Chng, Eng-Siong
Li, Haizhou
LANGUAGE RESOURCES AND EVALUATION, 2015, 49 (03) : 581 - 600
[2] SEAME: a Mandarin-English Code-switching Speech Corpus in South-East Asia
Lyu, Dau-Cheng
Tan, Tien-Ping
Chng, Eng-Siong
Li, Haizhou
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1986 - +
[3] A Review of the Mandarin-English Code-switching Corpus: SEAME
Lee, Grandee
Ho, Thi-Nga
Chng, Eng-Siong
Li, Haizhou
2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 210 - 213
[4] A Mandarin-English Code-Switching Corpus
Li, Ying
Yu, Yue
Fung, Pascale
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2515 - 2519
[5] Mandarin-English Code-switching Speech Recognition
Xu, Haihua
Van Tung Pham
Kyaw, Zin Tun
Lim, Zhi Hao
Chng, Eng Siong
Li, Haizhou
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 554 - 555
[6] TALCS: AN OPEN-SOURCE MANDARIN-ENGLISH CODE-SWITCHING CORPUS AND A SPEECH RECOGNITION BASELINE
Li, Chengfei
Deng, Shuhao
Wang, Yaoping
Wang, Guangjing
Gong, Yaguang
Chen, Changbin
Bai, Jinfeng
INTERSPEECH 2022, 2022, : 1741 - 1745
[7] Pronunciation augmentation for Mandarin-English code-switching speech recognition
Long, Yanhua
Wei, Shuang
Lian, Jie
Li, Yijie
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[8] Pronunciation augmentation for Mandarin-English code-switching speech recognition
Yanhua Long
Shuang Wei
Jie Lian
Yijie Li
EURASIP Journal on Audio, Speech, and Music Processing, 2021
[9] An Empirical Study on Punctuation Restoration for English, Mandarin, and Code-Switching Speech
Liu, Changsong
Thi Nga Ho
Chng, Eng Siong
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 286 - 296
[10] NON-AUTOREGRESSIVE MANDARIN-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
Chuang, Shun-Po
Chang, Heng-Jui
Huang, Sung-Feng
Lee, Hung-yi
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 465 - 472

← 1 2 3 4 5 →