Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things

被引：1

作者：

Yuan, Xiaoming ^{[1
,2
]}

Kong, Weixuan ^{[1
]}

Luo, Zhenyu ^{[1
]}

Xu, Minrui ^{[3
]}

机构：

[1] Northeastern Univ Qinhuangdao, Hebei Key Lab Marine Percept Network & Data Proc, Qinhuangdao 066004, Peoples R China

[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China

[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore

来源：

ELECTRONICS | 2024年 / 13卷 / 11期

基金：

中国国家自然科学基金;

关键词：

large language models; efficient inference offloading; mixture-of-experts; Internet of Medical Things;

D O I：

10.3390/electronics13112077

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Despite recent significant advancements in large language models (LLMs) for medical services, the deployment difficulties of LLMs in e-healthcare hinder complex medical applications in the Internet of Medical Things (IoMT). People are increasingly concerned about e-healthcare risks and privacy protection. Existing LLMs face difficulties in providing accurate medical questions and answers (Q&As) and meeting the deployment resource demands in the IoMT. To address these challenges, we propose MedMixtral 8x7B, a new medical LLM based on the mixture-of-experts (MoE) architecture with an offloading strategy, enabling deployment on the IoMT, improving the privacy protection for users. Additionally, we find that the significant factors affecting latency include the method of device interconnection, the location of offloading servers, and the speed of the disk.

引用

页数：17

共 50 条

[31] Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models
Shi, Yuge
Siddharth, N.
Paige, Brooks
Torr, Philip H. S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[32] Efficient Tuning and Inference for Large Language Models on Textual Graphs
Zhu, Yun
Wang, Yaoke
Shi, Haizhou
Tang, Siliang
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 5734 - 5742
[33] Energy Efficient Data Collection in Large-Scale Internet of Things via Computation Offloading
Li, Guorui
He, Jingsha
Peng, Sancheng
Jia, Weijia
Wang, Cong
Niu, Jianwei
Yu, Shui
IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (03): : 4176 - 4187
[34] An Efficient Computation Offloading Architecture for the Internet of Things (IoT) Devices
Shukla, Raj Mani
Munir, Arslan
2017 14TH IEEE ANNUAL CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE (CCNC), 2017, : 728 - 731
[35] Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Borzunov, Alexander
Ryabinin, Max
Chumachenko, Artem
Baranchuk, Dmitry
Dettmers, Tim
Belkada, Younes
Samygin, Pavel
Raffel, Colin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[36] Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach
He, Ying
Fang, Jingcheng
Yu, F. Richard
Leung, Victor C.
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 11253 - 11264
[37] Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Networks: An Active Inference Approach
Fang, Jingcheng
He, Ying
Yu, F. Richard
Li, Jianqiang
Leung, Victor C.
2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
[38] Exploring structure-property relationships in sparse data environments using mixture-of-experts models
Cheenady, Amith Adoor
Mukherjee, Arpan
Dongol, Ruhil
Rajan, Krishna
MRS BULLETIN, 2025, 50 (01) : 32 - 43
[39] DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Rajbhandari, Samyam
Li, Conglong
Yao, Zhewei
Zhang, Minjia
Aminabadi, Reza Yazdani
Awan, Ammar Ahmad
Rasley, Jeff
He, Yuxiong
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[40] Energy Efficient Dynamic Offloading in Mobile Edge Computing for Internet of Things
Chen, Ying
Zhang, Ning
Zhang, Yongchao
Chen, Xin
Wu, Wen
Shen, Xuemin
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2021, 9 (03) : 1050 - 1060

← 1 2 3 4 5 →