TeleSpeechPT: Large-Scale Chinese Multi-dialect and Multi-accent Speech Pre-training

被引：0

作者：

Chen, Hongjie ^{[1
]}

Li, Zehan ^{[1
]}

Xia, Guangmin ^{[1
]}

Liu, Boqing ^{[1
]}

Yang, Yan ^{[1
]}

Kang, Jian ^{[1
]}

Li, Jie ^{[1
]}

机构：

[1] China Telecom, Inst Artificial Intelligence TeleAI, Beijing, Peoples R China

来源：

MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024 | 2025年 / 2312卷

关键词：

Speech Pre-training; Accented ASR; Dialectal ASR;

D O I：

10.1007/978-981-96-1045-7_15

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We train Data2Vec2 models of various parameter scales on 300,000 h of unannotated Chinese multi-dialect and multi-accent speech data. These models are validated on multiple speech recognition datasets through fine-tuning and by utilizing them as feature extractors for CTC-based automatic speech recognition tasks. We are releasing these models to the open-source community to facilitate the research and application of speech processing technologies that support Chinese dialects and accents.

引用

页码：183 / 190

页数：8

共 50 条

[31] SelfPAB: large-scale pre-training on accelerometer data for human activity recognition
Logacjov, Aleksej
Herland, Sverre
Ustad, Astrid
Bach, Kerstin
APPLIED INTELLIGENCE, 2024, 54 (06) : 4545 - 4563
[32] CanvasEmb: Learning Layout Representation with Large-scale Pre-training for Graphic Design
Xie, Yuxi
Huang, Danqing
Wang, Jinpeng
Lin, Chin-Yew
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4100 - 4108
[33] SelfPAB: large-scale pre-training on accelerometer data for human activity recognition
Aleksej Logacjov
Sverre Herland
Astrid Ustad
Kerstin Bach
Applied Intelligence, 2024, 54 : 4545 - 4563
[34] Large-Scale Pre-training for Person Re-identification with Noisy Labels
Fu, Dengpan
Chen, Dongdong
Yang, Hao
Bao, Jianmin
Yuan, Lu
Zhang, Lei
Li, Houqiang
Wen, Fang
Chen, Dong
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2466 - 2476
[35] MVP-BERT: Multi-Vocab Pre-training for Chinese BERT
Zhu, Wei
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 260 - 269
[36] Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition
Masumura, Ryo
Makishima, Naoki
Ihori, Mana
Takashima, Akihiko
Tanaka, Tomohiro
Orihashi, Shota
INTERSPEECH 2020, 2020, : 2822 - 2826
[37] MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition
Zhou, Xiaohuan
Wang, Jiaming
Cui, Zeyu
Zhang, Shiliang
Yan, Zhijie
Zhou, Jingren
Zhou, Chang
INTERSPEECH 2023, 2023, : 4943 - 4947
[38] LARGE-SCALE UNSUPERVISED PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Wang, Pengwei
Wei, Liangchen
Cao, Yong
Xie, Jinghui
Nie, Zaiqing
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7999 - 8003
[39] ANGEL-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Nie, Xiaonan
Liu, Yi
Fu, Fangcheng
Xue, Jinbao
Jiao, Dian
Miao, Xupeng
Tao, Yangyu
Cui, Bin
PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3781 - 3794
[40] Robust feature learning for online discriminative tracking without large-scale pre-training
Zhang, Jun
Zhong, Bineng
Wang, Pengfei
Wang, Cheng
Du, Jixiang
FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (06) : 1160 - 1172

← 1 2 3 4 5 →