共 50 条
- [41] MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14868 - 14879
- [42] Efficient Deweather Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16812 - 16820
- [45] Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
- [47] Layer-Condensed KV Cache for Efficient Inference of Large Language Models PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11175 - 11188
- [48] Tabi: An Efficient Multi-Level Inference System for Large Language Models PROCEEDINGS OF THE EIGHTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, EUROSYS 2023, 2023, : 233 - 248
- [49] An efficient quantized GEMV implementation for large language models inference with matrix core JOURNAL OF SUPERCOMPUTING, 2025, 81 (03):
- [50] Generative Inference of Large Language Models in Edge Computing: An Energy Efficient Approach 20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 244 - 249