共 50 条
- [21] MERGEDISTILL: Merging Pre-trained Language Models using Distillation FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2874 - 2887
- [22] CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 475 - 486
- [23] Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation INTERSPEECH 2023, 2023, : 1364 - 1368
- [24] A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11239 - 11246
- [25] KroneckerBERT: Significant Compression of Pre-trained Language Models Through Kronecker Decomposition and Knowledge Distillation NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2116 - 2127
- [26] Pre-trained Language Model Representations for Language Generation 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
- [27] One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4408 - 4413
- [28] Classifying Code Comments via Pre-trained Programming Language Model 2023 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING, NLBSE, 2023, : 24 - 27
- [29] Development of a baseline model for MAX/MXene synthesis recipes extraction via pre-trained model with domain knowledge JOURNAL OF MATERIALS RESEARCH AND TECHNOLOGY-JMR&T, 2023, 22 : 2262 - 2274
- [30] Probing Pre-Trained Language Models for Disease Knowledge FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3023 - 3033