Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引:3
|
作者
He, Ying [1 ]
Fang, Jingcheng [1 ]
Yu, F. Richard [1 ,2 ]
Leung, Victor C. [3 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[2] Carleton Univ, Sch Informat Technol, Ottawa, ON K1S 5B6, Canada
[3] Univ British Columbia, Dept Elect Comp Engn, Vancouver V6T 1Z4, BC, Canada
基金
中国国家自然科学基金;
关键词
Task analysis; Computational modeling; Cloud computing; Resource management; Edge computing; Artificial neural networks; Predictive models; Active inference; cloud-edge computing; large language model; reinforcement learning; resource allocation; task offloading;
D O I
10.1109/TMC.2024.3415661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios.
引用
收藏
页码:11253 / 11264
页数:12
相关论文
共 50 条
  • [41] Decentralized Computation Offloading and Resource Allocation for Mobile-Edge Computing: A Matching Game Approach
    Quoc-Viet Pham
    Tuan Leanh
    Tran, Nguyen H.
    Park, Bang Ju
    Hong, Choong Seon
    IEEE ACCESS, 2018, 6 : 75868 - 75885
  • [42] Joint Offloading and Resource Allocation in Mobile Edge Computing Systems: An Actor-Critic Approach
    Zhang, Zhicai
    Yu, F. Richard
    Fu, Fang
    Yan, Qiao
    Wang, Zhouyang
    2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2018,
  • [43] A Quantum Reinforcement Learning Approach for Joint Resource Allocation and Task Offloading in Mobile Edge Computing
    Wei, Xinliang
    Gao, Xitong
    Ye, Kejiang
    Xu, Cheng-Zhong
    Wang, Yu
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (04) : 2580 - 2593
  • [44] Joint Task Offloading and Resource Allocation for Quality-Aware Edge-Assisted Machine Learning Task Inference
    Fan, Wenhao
    Chen, Zeyu
    Hao, Zhibo
    Wu, Fan
    Liu, Yuan'an
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (05) : 6739 - 6752
  • [45] Time-Slotted Task Offloading and Resource Allocation for Cloud-Edge-End Cooperative Computing Networks
    Fan, Wenhao
    Liu, Xun
    Yuan, Hao
    Li, Nan
    Liu, Yuan'an
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (08) : 8225 - 8241
  • [46] Toward Mobility-Aware Computation Offloading and Resource Allocation in End-Edge-Cloud Orchestrated Computing
    Dai, Bin
    Niu, Jianwei
    Ren, Tao
    Atiquzzaman, Mohammed
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (19) : 19450 - 19462
  • [47] Joint Computation Offloading and Resource Allocation for Hybrid Cloud and Edge Computing in Satellite-assisted Unmanned System
    Wu, Yeyu
    Fan, Huili
    Yu, Jian
    Yang, Li
    Huang, Qilong
    Qi, Yaowen
    39TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION, YAC 2024, 2024, : 1386 - 1391
  • [48] Joint Power Control and Resource Allocation With Task Offloading for Collaborative Device-Edge-Cloud Computing Systems
    Xie, Shumin
    Li, Kangshun
    Wang, Wenxiang
    Wang, Hui
    Jalil, Hassan
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2024, 2024
  • [49] Game Theory-Based Task Offloading and Resource Allocation for Vehicular Networks in Edge-Cloud Computing
    Jiang, Qinting
    Xu, Xiaolong
    He, Qiang
    Zhang, Xuyun
    Dai, Fei
    Qi, Lianyong
    Dou, Wanchun
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 341 - 346
  • [50] Multi-resource maximin share fair allocation in the cloud-edge collaborative computing system with bandwidth demand compression
    Guo, Hao
    Deng, Bin
    Li, Weidong
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2025, 28 (02):