Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things

被引：1

作者：

Yuan, Xiaoming ^{[1
,2
]}

Kong, Weixuan ^{[1
]}

Luo, Zhenyu ^{[1
]}

Xu, Minrui ^{[3
]}

机构：

[1] Northeastern Univ Qinhuangdao, Hebei Key Lab Marine Percept Network & Data Proc, Qinhuangdao 066004, Peoples R China

[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China

[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore

来源：

ELECTRONICS | 2024年 / 13卷 / 11期

基金：

中国国家自然科学基金;

关键词：

large language models; efficient inference offloading; mixture-of-experts; Internet of Medical Things;

D O I：

10.3390/electronics13112077

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Despite recent significant advancements in large language models (LLMs) for medical services, the deployment difficulties of LLMs in e-healthcare hinder complex medical applications in the Internet of Medical Things (IoMT). People are increasingly concerned about e-healthcare risks and privacy protection. Existing LLMs face difficulties in providing accurate medical questions and answers (Q&As) and meeting the deployment resource demands in the IoMT. To address these challenges, we propose MedMixtral 8x7B, a new medical LLM based on the mixture-of-experts (MoE) architecture with an offloading strategy, enabling deployment on the IoMT, improving the privacy protection for users. Additionally, we find that the significant factors affecting latency include the method of device interconnection, the location of offloading servers, and the speed of the disk.

引用

页数：17

共 50 条

[41] MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts
Cheng, Xuxin
Zhu, Zhihong
Zhuang, Xianwei
Chen, Zhanpeng
Huang, Zhiqi
Zou, Yuexian
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14868 - 14879
[42] Efficient Deweather Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation
Zhang, Rongyu
Luo, Yulin
Liu, Jiaming
Yang, Huanrui
Dong, Zhen
Gudovskiy, Denis
Okuno, Tomoyuki
Nakata, Yohei
Keutzer, Kurt
Du, Yuan
Zhang, Shanghang
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16812 - 16820
[43] Efficient Bayesian inference for dynamic mixture models
Gerlach, R
Carter, C
Kohn, R
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2000, 95 (451) : 819 - 828
[44] An internet of things malware classification method based on mixture of experts neural network
Wang, Fangwei
Yang, Shaojie
Li, Qingru
Wang, Changguan
TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2021, 32 (05)
[45] Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
Chowdhury, Mohammed Nowaz Rabbani
Zhang, Shuai
Wang, Meng
Liu, Sijia
Chen, Pin-Yu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[46] Adapted large language models can outperform medical experts in clinical text summarization
Van Veen, Dave
Van Uden, Cara
Blankemeier, Louis
Delbrouck, Jean-Benoit
Aali, Asad
Bluethgen, Christian
Pareek, Anuj
Polacin, Malgorzata
Reis, Eduardo Pontes
Seehofnerova, Anna
Rohatgi, Nidhi
Hosamani, Poonam
Collins, William
Ahuja, Neera
Langlotz, Curtis P.
Hom, Jason
Gatidis, Sergios
Pauly, John
Chaudhari, Akshay S.
NATURE MEDICINE, 2024, 30 (04) : 1134 - 1142
[47] Layer-Condensed KV Cache for Efficient Inference of Large Language Models
Wu, Haoyi
Tu, Kewei
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11175 - 11188
[48] Tabi: An Efficient Multi-Level Inference System for Large Language Models
Wang, Yiding
Chen, Kai
Tan, Haisheng
Guo, Kun
PROCEEDINGS OF THE EIGHTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, EUROSYS 2023, 2023, : 233 - 248
[49] An efficient quantized GEMV implementation for large language models inference with matrix core
Zhang, Yu
Lu, Lu
Zhao, Rong
Guo, Yijie
Yang, Zhanyu
JOURNAL OF SUPERCOMPUTING, 2025, 81 (03):
[50] Generative Inference of Large Language Models in Edge Computing: An Energy Efficient Approach
Yuan, Xingyu
Li, He
Ota, Kaoru
Dong, Mianxiong
20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 244 - 249

← 1 2 3 4 5 →