共 50 条
- [1] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
- [2] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 6159 - 6172
- [4] Adaptive Gating in Mixture-of-Experts based Language Models 2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3577 - 3587
- [5] Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3577 - 3599
- [7] Efficient Routing in Sparse Mixture-of-Experts Shamsolmoali, Pourya (pshams55@gmail.com), 1600, Institute of Electrical and Electronics Engineers Inc.