Pre-Training With Whole Word Masking for Chinese BERT

被引:807
作者
Cui, Yiming [1 ]
Che, Wanxiang [1 ]
Liu, Ting [1 ]
Qin, Bing [1 ]
Yang, Ziqing [2 ]
机构
[1] Harbin Inst Technol, Harbin 150001, Peoples R China
[2] iFLYTEK Res, State Key Lab Cognit Intelligence, Beijing 100010, Peoples R China
基金
国家重点研发计划;
关键词
Bit error rate; Task analysis; Computational modeling; Training; Analytical models; Adaptation models; Predictive models; Pre-trained language model; representation learning; natural language processing;
D O I
10.1109/TASLP.2021.3124365
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and its consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. Then we also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways. Especially, we propose a new masking strategy called MLM as correction (Mac). To demonstrate the effectiveness of these models, we create a series of Chinese pre-trained language models as our baselines, including BERT, RoBERTa, ELECTRA, RBT, etc. We carried out extensive experiments on ten Chinese NLP tasks to evaluate the created Chinese pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. We open-source our pre-trained language models for further facilitating our research community.(1)
引用
收藏
页码:3504 / 3514
页数:11
相关论文
共 40 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2018, Drcd: a chinese machine reading comprehension dataset
[3]  
[Anonymous], 2015, P 8 SIGHAN WORKSHOP
[4]  
Che W., 2010, P 23 INT C COMP LING, P13, DOI DOI 10.1007/S00466-010-0527-8
[5]  
Chen J, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P4946
[6]  
Choi E, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2174
[7]  
Clark K, 2020, INFORM SYST RES, DOI [DOI 10.48550/ARXIV.2003.10555, 10.48550/arXiv.2003.10555, DOI 10.48550/arXiv.2003.10555]
[8]  
Conneau A, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2475
[9]  
Cui YM, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5883
[10]  
Dai ZH, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2978