TeleSpeechPT: Large-Scale Chinese Multi-dialect and Multi-accent Speech Pre-training

被引:0
|
作者
Chen, Hongjie [1 ]
Li, Zehan [1 ]
Xia, Guangmin [1 ]
Liu, Boqing [1 ]
Yang, Yan [1 ]
Kang, Jian [1 ]
Li, Jie [1 ]
机构
[1] China Telecom, Inst Artificial Intelligence TeleAI, Beijing, Peoples R China
关键词
Speech Pre-training; Accented ASR; Dialectal ASR;
D O I
10.1007/978-981-96-1045-7_15
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We train Data2Vec2 models of various parameter scales on 300,000 h of unannotated Chinese multi-dialect and multi-accent speech data. These models are validated on multiple speech recognition datasets through fine-tuning and by utilizing them as feature extractors for CTC-based automatic speech recognition tasks. We are releasing these models to the open-source community to facilitate the research and application of speech processing technologies that support Chinese dialects and accents.
引用
收藏
页码:183 / 190
页数:8
相关论文
共 50 条
  • [31] SelfPAB: large-scale pre-training on accelerometer data for human activity recognition
    Logacjov, Aleksej
    Herland, Sverre
    Ustad, Astrid
    Bach, Kerstin
    APPLIED INTELLIGENCE, 2024, 54 (06) : 4545 - 4563
  • [32] CanvasEmb: Learning Layout Representation with Large-scale Pre-training for Graphic Design
    Xie, Yuxi
    Huang, Danqing
    Wang, Jinpeng
    Lin, Chin-Yew
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4100 - 4108
  • [33] SelfPAB: large-scale pre-training on accelerometer data for human activity recognition
    Aleksej Logacjov
    Sverre Herland
    Astrid Ustad
    Kerstin Bach
    Applied Intelligence, 2024, 54 : 4545 - 4563
  • [34] Large-Scale Pre-training for Person Re-identification with Noisy Labels
    Fu, Dengpan
    Chen, Dongdong
    Yang, Hao
    Bao, Jianmin
    Yuan, Lu
    Zhang, Lei
    Li, Houqiang
    Wen, Fang
    Chen, Dong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2466 - 2476
  • [35] MVP-BERT: Multi-Vocab Pre-training for Chinese BERT
    Zhu, Wei
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 260 - 269
  • [36] Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition
    Masumura, Ryo
    Makishima, Naoki
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Orihashi, Shota
    INTERSPEECH 2020, 2020, : 2822 - 2826
  • [37] MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition
    Zhou, Xiaohuan
    Wang, Jiaming
    Cui, Zeyu
    Zhang, Shiliang
    Yan, Zhijie
    Zhou, Jingren
    Zhou, Chang
    INTERSPEECH 2023, 2023, : 4943 - 4947
  • [38] LARGE-SCALE UNSUPERVISED PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Wang, Pengwei
    Wei, Liangchen
    Cao, Yong
    Xie, Jinghui
    Nie, Zaiqing
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7999 - 8003
  • [39] ANGEL-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
    Nie, Xiaonan
    Liu, Yi
    Fu, Fangcheng
    Xue, Jinbao
    Jiao, Dian
    Miao, Xupeng
    Tao, Yangyu
    Cui, Bin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3781 - 3794
  • [40] Robust feature learning for online discriminative tracking without large-scale pre-training
    Zhang, Jun
    Zhong, Bineng
    Wang, Pengfei
    Wang, Cheng
    Du, Jixiang
    FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (06) : 1160 - 1172