Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引：3

作者：

He, Ying ^{[1
]}

Fang, Jingcheng ^{[1
]}

Yu, F. Richard ^{[1
,2
]}

Leung, Victor C. ^{[3
]}

机构：

[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China

[2] Carleton Univ, Sch Informat Technol, Ottawa, ON K1S 5B6, Canada

[3] Univ British Columbia, Dept Elect Comp Engn, Vancouver V6T 1Z4, BC, Canada

来源：

IEEE TRANSACTIONS ON MOBILE COMPUTING | 2024年 / 23卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Computational modeling; Cloud computing; Resource management; Edge computing; Artificial neural networks; Predictive models; Active inference; cloud-edge computing; large language model; reinforcement learning; resource allocation; task offloading;

D O I：

10.1109/TMC.2024.3415661

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios.

引用

页码：11253 / 11264

页数：12

共 50 条

[21] Computation Offloading and Resource Allocation For Cloud Assisted Mobile Edge Computing in Vehicular Networks
Zhao, Junhui
Li, Qiuping
Gong, Yi
Zhang, Ke
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (08) : 7944 - 7956
[22] QoS-Aware Augmented Reality Task Offloading and Resource Allocation in Cloud-Edge Collaboration Environment
Hao, Jia
Chen, Yang
Gan, Jianhou
JOURNAL OF NETWORK AND SYSTEMS MANAGEMENT, 2025, 33 (01)
[23] A Near-Optimal Approach for Online Task Offloading and Resource Allocation in Edge-Cloud Orchestrated Computing
Liu, Tong
Fang, Lu
Zhu, Yanmin
Tong, Weiqin
Yang, Yuanyuan
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2022, 21 (08) : 2687 - 2700
[24] HTR: A Joint Approach for Task Offloading and Resource Allocation in Mobile Edge Computing
Wang, Zilong
Du, Hongwei
Ye, Qiang
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
[25] A Cloud-Edge Collaborative Computing Task Scheduling and Resource Allocation Algorithm for Energy Internet Environment
Song, Xin
Wang, Yue
Xie, Zhigang
Xia, Lin
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (06): : 2282 - 2303
[26] Edge-IoT Computing and Networking Resource Allocation for Decomposable Deep Learning Inference
Yang, Ya-Ting
Wei, Hung-Yu
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (06) : 5178 - 5193
[27] Resource Allocation Strategy Using Deep Reinforcement Learning in Cloud-Edge Collaborative Computing Environment
Cen, Junjie
Li, Yongbo
MOBILE INFORMATION SYSTEMS, 2022, 2022
[28] A Cloud-edge Collaborative Framework for Computing Tasks Based on Load Forecasting and Resource Adaptive Allocation
Meng, Yu
Liu, Xingchuan
Chen, Jiaxi
Nie, Yongjie
2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 1120 - 1124
[29] Joint Computation Offloading and Resource Allocation in Mobile-Edge Cloud Computing: A Two-Layer Game Approach
He, Zhenli
Guo, Ying
Zhai, Xiaolong
Zhao, Mingxiong
Zhou, Wei
Li, Keqin
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2025, 13 (01) : 411 - 428
[30] Delay-aware resource allocation for partial computation offloading in mobile edge cloud computing
Yu, Lingfei
Xu, Hongliu
Zeng, Yunhao
Deng, Jiali
PERVASIVE AND MOBILE COMPUTING, 2024, 105

← 1 2 3 4 5 →