An Empirical Study on Punctuation Restoration for English, Mandarin, and Code-Switching Speech

被引:0
|
作者
Liu, Changsong [1 ]
Thi Nga Ho [1 ]
Chng, Eng Siong [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
基金
新加坡国家研究基金会;
关键词
Punctuation Restoration; Multilingual; Codeswitching; Automatic Speech Recognition; Singaporean Speech;
D O I
10.1007/978-981-99-5837-5_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Punctuation restoration is a crucial task in enriching automated transcripts produced by Automatic Speech Recognition (ASR) systems. This paper presents an empirical study on the impact of employing different data acquisition and training strategies on the performance of punctuation restoration models for multilingual and codeswitching speech. The study focuses on two of the most popular Singaporean spoken languages, namely English and Mandarin in both monolingual and codeswitching forms. Specifically, we experimented with in-domain and out-of-domain evaluation for multilingual and codeswitching speech. Subsequently, we enlarge the training data by sampling the codeswitching corpus by reordering the conversational transcripts. We also proposed to ensemble the predicting models by averaging saved model checkpoints instead of using the last checkpoint to improve the model performance. The model employs a slot-filling approach to predict the punctuation at each word boundary. Through utilizing and enlarging the available datasets as well as ensemble different model checkpoints, the result reaches an F1 score of 76.5% and 79.5% respectively for monolingual and codeswitch test sets, which exceeds the state-of-art performance. This investigation contributes to the existing literature on punctuation restoration for multilingual and code-switch speech. It offers insights into the importance of averaging model checkpoints in improving the final model's performance. Source codes and trained models are published on our Github's repo for future replications and usage.(https://github.com/charlieliu331/Punctuation_Restoration)
引用
收藏
页码:286 / 296
页数:11
相关论文
共 50 条
  • [31] Bi-encoder Transformer Network for Mandarin-English Code-switching Speech Recognition using Mixture of Experts
    Lu, Yizhou
    Huang, Mingkun
    Li, Hao
    Guo, Jiaqi
    Qian, Yanmin
    INTERSPEECH 2020, 2020, : 4766 - 4770
  • [32] Rnn-transducer With Language Bias For End-to-end Mandarin-English Code-switching Speech Recognition
    Zhang, Shuai
    Yi, Jiangyan
    Tian, Zhengkun
    Tao, Jianhua
    Bai, Ye
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [33] Study on Teachers' Code-switching in Specialized English Class
    Shi, Weixuan
    Shen, Chen
    2014 4TH INTERNATIONAL CONFERENCE ON APPLIED SOCIAL SCIENCE (ICASS 2014), PT 1, 2014, 51 : 44 - 47
  • [34] The study of the perception of code-switching to English in German advertising
    Zhiganova, Anna V.
    INTERNATIONAL CONFERENCE ON COMMUNICATION IN MULTICULTURAL SOCIETY, (CMSC 2015), 2016, 236 : 225 - 229
  • [35] Hybrid CTC Language Identification Structure for Mandarin-English Code-Switching ASR
    Yin, Hengxin
    Hu, Guangyu
    Wang, Fei
    Ren, Pengfei
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 537 - 541
  • [36] Code-switching in Indic Speech Synthesisers
    Thomas, Anju Leela
    Prakash, Anusha
    Baby, Arun
    Murthy, Hema A.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1948 - 1952
  • [37] MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization
    Chua, Victoria Y. H.
    Liu, Hexin
    Perera, Leibny Paola Garcia
    Woon, Fei Ting
    Wong, Jinyi
    Zhang, Xiangyu
    Khudanpur, Sanjeev
    Khong, Andy W. H.
    Dauwels, Justin
    Styles, Suzy J.
    INTERSPEECH 2023, 2023, : 4109 - 4113
  • [38] Code-switching in medieval English drama
    Diller, HJ
    COMPARATIVE DRAMA, 1997, 31 (04) : 506 - 537
  • [39] Code-Switching and College English Teaching
    Bo, Li
    PROCEEDINGS OF THE SIXTH NORTHEAST ASIA INTERNATIONAL SYMPOSIUM ON LANGUAGE, LITERATURE AND TRANSLATION, 2017, : 724 - 729
  • [40] CODE-SWITCHING - HINDI-ENGLISH
    VERMA, SK
    LINGUA, 1976, 38 (02) : 153 - 165