Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引：3

作者：

He, Ying ^{[1
]}

Fang, Jingcheng ^{[1
]}

Yu, F. Richard ^{[1
,2
]}

Leung, Victor C. ^{[3
]}

机构：

[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China

[2] Carleton Univ, Sch Informat Technol, Ottawa, ON K1S 5B6, Canada

[3] Univ British Columbia, Dept Elect Comp Engn, Vancouver V6T 1Z4, BC, Canada

来源：

IEEE TRANSACTIONS ON MOBILE COMPUTING | 2024年 / 23卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Computational modeling; Cloud computing; Resource management; Edge computing; Artificial neural networks; Predictive models; Active inference; cloud-edge computing; large language model; reinforcement learning; resource allocation; task offloading;

D O I：

10.1109/TMC.2024.3415661

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios.

引用

页码：11253 / 11264

页数：12

共 50 条

[1] Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Networks: An Active Inference Approach
Fang, Jingcheng
He, Ying
Yu, F. Richard
Li, Jianqiang
Leung, Victor C.
2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
[2] Incentive-driven Computation Offloading and Resource Allocation in Mobile Cloud-Edge Computing
Li, Mingze
Wu, Tong
Zhou, Huan
Zhao, Liang
Leung, Victor C. M.
2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW), 2022, : 157 - 162
[3] Generative Inference of Large Language Models in Edge Computing: An Energy Efficient Approach
Yuan, Xingyu
Li, He
Ota, Kaoru
Dong, Mianxiong
20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 244 - 249
[4] An Adaptive Computing Offloading and Resource Allocation Strategy for Internet of Vehicles Based on Cloud-Edge Collaboration
Shu, Wanneng
Yu, Haoxin
Zhai, Cao
Feng, Xuanxuan
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024,
[5] Reverse Auction-Based Computation Offloading and Resource Allocation in Mobile Cloud-Edge Computing
Zhou, Huan
Wu, Tong
Chen, Xin
He, Shibo
Guo, Deke
Wu, Jie
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (10) : 6144 - 6159
[6] Multiuser Computation Offloading and Resource Allocation for Cloud-Edge Heterogeneous Network
Chen, Qinglin
Kuang, Zhufang
Zhao, Lian
IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (05) : 3799 - 3811
[7] Adaptive Data Sharing and Computation Offloading in Cloud-Edge Computing with Resource Constraints
Chu, Wenjie
Zhao, Haiyan
Jin, Zhi
Hu, Zhenjiang
2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 2842 - 2849
[8] Energy-Efficient Cloud-Edge Collaborative Computing: Joint Task Offloading, Resource Allocation, and Service Caching
Liang, Yong
Sun, Haifeng
Deng, Yunfeng
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14879 : 285 - 296
[9] Towards Blockchain-Based Resource Allocation Models for Cloud-Edge Computing in IoT Applications
Liu, Xing
WIRELESS PERSONAL COMMUNICATIONS, 2024, 135 (04) : 2483 - 2483
[10] Beyond the Cloud: Edge Inference for Generative Large Language Models in Wireless Networks
Zhang, Xinyuan
Nie, Jiangtian
Huang, Yudong
Xie, Gaochang
Xiong, Zehui
Liu, Jiang
Niyato, Dusit
Shen, Xuemin
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2025, 24 (01) : 643 - 658

← 1 2 3 4 5 →