共 50 条
- [12] Symbolic Knowledge Distillation: from General Language Models to Commonsense Models NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4602 - 4625
- [13] MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16067 - 16075
- [14] Effective Compression of Language Models by Combining Pruning and Knowledge Distillation 2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 429 - 438
- [15] Efficient Routing in Sparse Mixture-of-Experts Shamsolmoali, Pourya (pshams55@gmail.com), 1600, Institute of Electrical and Electronics Engineers Inc.
- [16] Dynamic Knowledge Distillation for Pre-trained Language Models 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 379 - 389
- [17] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 6159 - 6172
- [19] Scalable Syntax-Aware Language Models Using Knowledge Distillation 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3472 - 3484
- [20] Web Content Filtering through knowledge distillation of Large Language Models 2023 IEEE INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2023, : 357 - 361