TeleSpeechPT: Large-Scale Chinese Multi-dialect and Multi-accent Speech Pre-training

被引:0
|
作者
Chen, Hongjie [1 ]
Li, Zehan [1 ]
Xia, Guangmin [1 ]
Liu, Boqing [1 ]
Yang, Yan [1 ]
Kang, Jian [1 ]
Li, Jie [1 ]
机构
[1] China Telecom, Inst Artificial Intelligence TeleAI, Beijing, Peoples R China
关键词
Speech Pre-training; Accented ASR; Dialectal ASR;
D O I
10.1007/978-981-96-1045-7_15
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We train Data2Vec2 models of various parameter scales on 300,000 h of unannotated Chinese multi-dialect and multi-accent speech data. These models are validated on multiple speech recognition datasets through fine-tuning and by utilizing them as feature extractors for CTC-based automatic speech recognition tasks. We are releasing these models to the open-source community to facilitate the research and application of speech processing technologies that support Chinese dialects and accents.
引用
收藏
页码:183 / 190
页数:8
相关论文
共 50 条
  • [21] Reliable Accent-Specific Unit Generation With Discriminative Dynamic Gaussian Mixture Selection for Multi-Accent Chinese Speech Recognition
    Zhang, Chao
    Liu, Yi
    Xia, Yunqing
    Wang, Xuan
    Lee, Chin-Hui
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2073 - 2084
  • [22] Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
    Gu, Jiaxi
    Meng, Xiaojun
    Lu, Guansong
    Hou, Lu
    Niu, Minzhe
    Liang, Xiaodan
    Yao, Lewei
    Huang, Runhui
    Zhang, Wei
    Jiang, Xin
    Xu, Chunjing
    Xu, Hang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [23] MuralDiff: Diffusion for Ancient Murals Restoration on Large-Scale Pre-Training
    Xu, Zishan
    Zhang, Xiaofeng
    Chen, Wei
    Liu, Jueting
    Xu, Tingting
    Wang, Zehua
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2169 - 2181
  • [24] DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation
    Zhang, Yizhe
    Sun, Siqi
    Galley, Michel
    Chen, Yen-Chun
    Brockett, Chris
    Gao, Xiang
    Gao, Jianfeng
    Liu, Jingjing
    Dolan, Bill
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): SYSTEM DEMONSTRATIONS, 2020, : 270 - 278
  • [25] BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training
    Cai, Likun
    Zhang, Zhi
    Zhu, Yi
    Zhang, Li
    Li, Mu
    Xue, Xiangyang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4776 - 4786
  • [26] Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone
    Kanda, Naoyuki
    Ye, Guoli
    Wu, Yu
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Yoshioka, Takuya
    INTERSPEECH 2021, 2021, : 3430 - 3434
  • [27] Multi-Stage Pre-training for Automated Chinese Essay Scoring
    Wei Song
    Kai Zhang
    Fu, Ruiji
    Liu, Lizhen
    Liu, Ting
    Cheng, Miaomiao
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6723 - 6733
  • [28] Multi-task Pre-training for Lhasa-Tibetan Speech Recognition
    Liu, Yigang
    Zhao, Yue
    Xu, Xiaona
    Xu, Liang
    Zhang, Xubei
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IX, 2023, 14262 : 78 - 90
  • [29] EVA2.0: Investigating Open-domain Chinese Dialogue Systems with Large-scale Pre-training
    Gu, Yuxian
    Wen, Jiaxin
    Sun, Hao
    Song, Yi
    Ke, Pei
    Zheng, Chujie
    Zhang, Zheng
    Yao, Jianzhu
    Liu, Lei
    Zhu, Xiaoyan
    Huang, Minlie
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 207 - 219
  • [30] Large-scale weakly-supervised pre-training for video action recognition
    Ghadiyaram, Deepti
    Du Tran
    Mahajan, Dhruv
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12038 - 12047