An Empirical Study on Punctuation Restoration for English, Mandarin, and Code-Switching Speech

被引:0
|
作者
Liu, Changsong [1 ]
Thi Nga Ho [1 ]
Chng, Eng Siong [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
基金
新加坡国家研究基金会;
关键词
Punctuation Restoration; Multilingual; Codeswitching; Automatic Speech Recognition; Singaporean Speech;
D O I
10.1007/978-981-99-5837-5_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Punctuation restoration is a crucial task in enriching automated transcripts produced by Automatic Speech Recognition (ASR) systems. This paper presents an empirical study on the impact of employing different data acquisition and training strategies on the performance of punctuation restoration models for multilingual and codeswitching speech. The study focuses on two of the most popular Singaporean spoken languages, namely English and Mandarin in both monolingual and codeswitching forms. Specifically, we experimented with in-domain and out-of-domain evaluation for multilingual and codeswitching speech. Subsequently, we enlarge the training data by sampling the codeswitching corpus by reordering the conversational transcripts. We also proposed to ensemble the predicting models by averaging saved model checkpoints instead of using the last checkpoint to improve the model performance. The model employs a slot-filling approach to predict the punctuation at each word boundary. Through utilizing and enlarging the available datasets as well as ensemble different model checkpoints, the result reaches an F1 score of 76.5% and 79.5% respectively for monolingual and codeswitch test sets, which exceeds the state-of-art performance. This investigation contributes to the existing literature on punctuation restoration for multilingual and code-switch speech. It offers insights into the importance of averaging model checkpoints in improving the final model's performance. Source codes and trained models are published on our Github's repo for future replications and usage.(https://github.com/charlieliu331/Punctuation_Restoration)
引用
收藏
页码:286 / 296
页数:11
相关论文
共 50 条
  • [41] Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and Mandarin
    Rao, Abhinav
    Thi-Nga, Ho
    Siong, Chng Eng
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 546 - 552
  • [42] Hinglish: code-switching in Indian English
    Sailaja, Pingali
    ELT JOURNAL, 2011, 65 (04) : 473 - 480
  • [43] Code-switching in early English literature
    Schendl, Herbert
    LANGUAGE AND LITERATURE, 2015, 24 (03) : 233 - 248
  • [44] AN EVALUATION BENCHMARK FOR AUTOMATIC SPEECH RECOGNITION OF GERMAN-ENGLISH CODE-SWITCHING
    Khosravani, Abbas
    Garner, Philip N.
    Lazaridis, Alexandros
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 811 - 816
  • [45] Borrowing or Code-switching? Traces of community norms in Vietnamese-English speech
    Li Nguyen
    AUSTRALIAN JOURNAL OF LINGUISTICS, 2018, 38 (04) : 443 - 466
  • [46] Developing an Automatic Speech Recognizer For Filipino with English Code-Switching in News Broadcast
    Lim, Mark Louis
    Xu, Aaron John
    Lin, Charles Stepven
    Chen, Zishi
    Pascual, Ronald
    2022-14TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST 2022), 2022, : 13 - 17
  • [47] CanVEC - the Canberra Vietnamese-English Code-switching Natural Speech Corpus
    Li Nguyen
    Bryant, Christopher
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4121 - 4129
  • [48] TEXTUAL DATA AUGMENTATION FOR ARABIC-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
    Hussein, Amir
    Chowdhury, Shammur Absar
    Abdelali, Ahmed
    Dehak, Najim
    Ali, Ahmed
    Khudanpur, Sanjeev
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 777 - 784
  • [49] A Study of Code-Switching in Students' Talk in College English Class
    席红梅
    赵快
    海外英语, 2015, (04) : 133 - 134
  • [50] A Study of Code-switching between Mandarin and Yantai Dialect from Social Perspectives
    Lyu, Cui-Cui
    2016 INTERNATIONAL CONFERENCE ON EDUCATION SCIENCE AND EDUCATION MANAGEMENT (ESEM 2016), 2016, : 62 - 66