Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引：3

作者：

He, Ying ^{[1
]}

Fang, Jingcheng ^{[1
]}

Yu, F. Richard ^{[1
,2
]}

Leung, Victor C. ^{[3
]}

机构：

[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China

[2] Carleton Univ, Sch Informat Technol, Ottawa, ON K1S 5B6, Canada

[3] Univ British Columbia, Dept Elect Comp Engn, Vancouver V6T 1Z4, BC, Canada

来源：

IEEE TRANSACTIONS ON MOBILE COMPUTING | 2024年 / 23卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Computational modeling; Cloud computing; Resource management; Edge computing; Artificial neural networks; Predictive models; Active inference; cloud-edge computing; large language model; reinforcement learning; resource allocation; task offloading;

D O I：

10.1109/TMC.2024.3415661

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios.

引用

页码：11253 / 11264

页数：12

共 50 条

[31] Dynamic Task Offloading and Resource Allocation for Mobile-Edge Computing in Dense Cloud RAN
Zhang, Qi
Gui, Lin
Hou, Fen
Chen, Jiacheng
Zhu, Shichao
Tian, Feng
IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (04) : 3282 - 3299
[32] A Task Offloading and Resource Allocation Optimization Method in End-Edge-Cloud Orchestrated Computing
Peng, Bo
Peng, Shi Lin
Li, Qiang
Chen, Cheng
Zhou, Yu Zhu
Lei, Xiang
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT VI, 2024, 14492 : 299 - 310
[33] A hierarchical optimization approach for industrial task offloading and resource allocation in edge computing systems
Dong, Jiadong
Chen, Lin
Zheng, Chunxiang
Pan, Kai
Guo, Qinghu
Wu, Shunfeng
Wang, Zhaoxiang
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (05): : 5953 - 5979
[34] Optimizing task offloading and resource allocation in edge-cloud networks: a DRL approach
Ihsan Ullah
Hyun-Kyo Lim
Yeong-Jun Seok
Youn-Hee Han
Journal of Cloud Computing, 12
[35] Optimizing task offloading and resource allocation in edge-cloud networks: a DRL approach
Ullah, Ihsan
Lim, Hyun-Kyo
Seok, Yeong-Jun
Han, Youn-Hee
JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2023, 12 (01):
[36] Efficient task migration and resource allocation in cloud-edge collaboration: A DRL approach with learnable masking
Wang, Yang
Chen, Juan
Wu, Zongling
Chen, Peng
Li, Xi
Hao, Junfeng
ALEXANDRIA ENGINEERING JOURNAL, 2025, 111 : 107 - 122
[37] Profit-Maximized Collaborative Computation Offloading and Resource Allocation in Distributed Cloud and Edge Computing Systems
Yuan, Haitao
Zhou, MengChu
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2021, 18 (03) : 1277 - 1287
[38] Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things
Yuan, Xiaoming
Kong, Weixuan
Luo, Zhenyu
Xu, Minrui
ELECTRONICS, 2024, 13 (11)
[39] Active inference goes to school: the importance of active learning in the age of large language models
Di Paolo, Laura Desiree
White, Ben
Guenin-Carlut, Avel
Constant, Axel
Clark, Andy
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2024, 379 (1911)
[40] A Bilevel Optimization Approach for Joint Offloading Decision and Resource Allocation in Cooperative Mobile Edge Computing
Huang, Pei-Qiu
Wang, Yong
Wang, Kezhi
Liu, Zhi-Zhong
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (10) : 4228 - 4241

← 1 2 3 4 5 →