TeleSpeechPT: Large-Scale Chinese Multi-dialect and Multi-accent Speech Pre-training

被引:0
|
作者
Chen, Hongjie [1 ]
Li, Zehan [1 ]
Xia, Guangmin [1 ]
Liu, Boqing [1 ]
Yang, Yan [1 ]
Kang, Jian [1 ]
Li, Jie [1 ]
机构
[1] China Telecom, Inst Artificial Intelligence TeleAI, Beijing, Peoples R China
关键词
Speech Pre-training; Accented ASR; Dialectal ASR;
D O I
10.1007/978-981-96-1045-7_15
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We train Data2Vec2 models of various parameter scales on 300,000 h of unannotated Chinese multi-dialect and multi-accent speech data. These models are validated on multiple speech recognition datasets through fine-tuning and by utilizing them as feature extractors for CTC-based automatic speech recognition tasks. We are releasing these models to the open-source community to facilitate the research and application of speech processing technologies that support Chinese dialects and accents.
引用
收藏
页码:183 / 190
页数:8
相关论文
共 50 条
  • [1] Multi-Accent Chinese Speech Recognition
    Liu Yi
    Fung, Pascale
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 133 - +
  • [2] Chinese Multi-Dialect Speech Recognition Based on Instruction Tuning
    Ding, Timin
    Sun, Kai
    Zhang, Xu
    Yu, Jian
    Huang, Degen
    FOURTH SYMPOSIUM ON PATTERN RECOGNITION AND APPLICATIONS, SPRA 2023, 2024, 13162
  • [3] Wav2vec-MoE: An unsupervised pre-training and adaptation method for multi-accent ASR
    Lin, Yuqin
    Zhang, Shiliang
    Gao, Zhifu
    Wang, Longbiao
    Yang, Yanbing
    Dang, Jianwu
    ELECTRONICS LETTERS, 2023, 59 (11)
  • [4] Prediction of chemical reaction yields with large-scale multi-view pre-training
    Shi, Runhan
    Yu, Gufeng
    Huo, Xiaohong
    Yang, Yang
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
  • [5] Prediction of chemical reaction yields with large-scale multi-view pre-training
    Runhan Shi
    Gufeng Yu
    Xiaohong Huo
    Yang Yang
    Journal of Cheminformatics, 16
  • [6] Adaptive Attention Network with Domain Adversarial Training for Multi-Accent Speech Recognition
    Yang, Yanbing
    Shi, Hao
    Lin, Yuqin
    Ge, Meng
    Wang, Longbiao
    Hou, Qingzhi
    Dang, Jianwu
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 6 - 10
  • [7] Pre-training on Large-Scale Heterogeneous Graph
    Jiang, Xunqiang
    Jia, Tianrui
    Fang, Yuan
    Shi, Chuan
    Lin, Zhe
    Wang, Hui
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 756 - 766
  • [8] Large-Scale Unsupervised Audio Pre-Training for Video-to-Speech Synthesis
    Kefalas, Triantafyllos
    Panagakis, Yannis
    Pantic, Maja
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2255 - 2268
  • [9] WenLan: Efficient Large-Scale Multi-Modal Pre-Training on Real World Data
    Song, Ruihua
    MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 3 - 3
  • [10] ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation
    Qi, Weizhen
    Gong, Yeyun
    Yan, Yu
    Xu, Can
    Yao, Bolun
    Zhou, Bartuer
    Cheng, Biao
    Jiang, Daxin
    Chen, Jiusheng
    Zhang, Ruofei
    Li, Hougiang
    Duan, Nan
    ACL-IJCNLP 2021: THE JOINT CONFERENCE OF THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE SYSTEM DEMONSTRATIONS, 2021, : 232 - 239