Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things

被引:1
|
作者
Yuan, Xiaoming [1 ,2 ]
Kong, Weixuan [1 ]
Luo, Zhenyu [1 ]
Xu, Minrui [3 ]
机构
[1] Northeastern Univ Qinhuangdao, Hebei Key Lab Marine Percept Network & Data Proc, Qinhuangdao 066004, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China
[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
large language models; efficient inference offloading; mixture-of-experts; Internet of Medical Things;
D O I
10.3390/electronics13112077
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite recent significant advancements in large language models (LLMs) for medical services, the deployment difficulties of LLMs in e-healthcare hinder complex medical applications in the Internet of Medical Things (IoMT). People are increasingly concerned about e-healthcare risks and privacy protection. Existing LLMs face difficulties in providing accurate medical questions and answers (Q&As) and meeting the deployment resource demands in the IoMT. To address these challenges, we propose MedMixtral 8x7B, a new medical LLM based on the mixture-of-experts (MoE) architecture with an offloading strategy, enabling deployment on the IoMT, improving the privacy protection for users. Additionally, we find that the significant factors affecting latency include the method of device interconnection, the location of offloading servers, and the speed of the disk.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models
    Shi, Yuge
    Siddharth, N.
    Paige, Brooks
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [32] Efficient Tuning and Inference for Large Language Models on Textual Graphs
    Zhu, Yun
    Wang, Yaoke
    Shi, Haizhou
    Tang, Siliang
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 5734 - 5742
  • [33] Energy Efficient Data Collection in Large-Scale Internet of Things via Computation Offloading
    Li, Guorui
    He, Jingsha
    Peng, Sancheng
    Jia, Weijia
    Wang, Cong
    Niu, Jianwei
    Yu, Shui
    IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (03): : 4176 - 4187
  • [34] An Efficient Computation Offloading Architecture for the Internet of Things (IoT) Devices
    Shukla, Raj Mani
    Munir, Arslan
    2017 14TH IEEE ANNUAL CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE (CCNC), 2017, : 728 - 731
  • [35] Distributed Inference and Fine-tuning of Large Language Models Over The Internet
    Borzunov, Alexander
    Ryabinin, Max
    Chumachenko, Artem
    Baranchuk, Dmitry
    Dettmers, Tim
    Belkada, Younes
    Samygin, Pavel
    Raffel, Colin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach
    He, Ying
    Fang, Jingcheng
    Yu, F. Richard
    Leung, Victor C.
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 11253 - 11264
  • [37] Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Networks: An Active Inference Approach
    Fang, Jingcheng
    He, Ying
    Yu, F. Richard
    Li, Jianqiang
    Leung, Victor C.
    2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
  • [38] Exploring structure-property relationships in sparse data environments using mixture-of-experts models
    Cheenady, Amith Adoor
    Mukherjee, Arpan
    Dongol, Ruhil
    Rajan, Krishna
    MRS BULLETIN, 2025, 50 (01) : 32 - 43
  • [39] DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
    Rajbhandari, Samyam
    Li, Conglong
    Yao, Zhewei
    Zhang, Minjia
    Aminabadi, Reza Yazdani
    Awan, Ammar Ahmad
    Rasley, Jeff
    He, Yuxiong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [40] Energy Efficient Dynamic Offloading in Mobile Edge Computing for Internet of Things
    Chen, Ying
    Zhang, Ning
    Zhang, Yongchao
    Chen, Xin
    Wu, Wen
    Shen, Xuemin
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2021, 9 (03) : 1050 - 1060