VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

被引:1
|
作者
Yao, Zhi [1 ,2 ]
Tang, Zhiqing [1 ]
Lou, Jiong [3 ]
Shen, Ping [1 ]
Jia, Weijia [1 ,4 ]
机构
[1] Beijing Normal Univ, Inst Artificial Intelligence & Future Networks, Beijing 519087, Peoples R China
[2] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[4] BNU HKBU United Int Coll, Guangdong Key Lab & Multi Modal Data Proc, Zhuhai 519087, Peoples R China
基金
中国国家自然科学基金;
关键词
Edge Computing; Quality of Services; Vector Database; Multi-Agent Reinforcement Learning; Large Language Model; Request Scheduling;
D O I
10.1109/ICWS62655.2024.00105
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substantially mitigate response delays and cost associated with similar requests, which has been overlooked by previous research. Addressing these gaps, this paper introduces a novel Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework. Firstly, we propose the VELO framework, which ingeniously employs vector database to cache the results of some LLM requests at the edge to reduce the response time of subsequent similar requests. Diverging from direct optimization of the LLM, our VELO framework does not necessitate altering the internal structure of LLM and is broadly applicable to diverse LLMs. Subsequently, building upon the VELO framework, we formulate the QoS optimization problem as a Markov Decision Process (MDP) and devise an algorithm grounded in Multi-Agent Reinforcement Learning (MARL) to decide whether to request the LLM in the cloud or directly return the results from the vector database at the edge. Moreover, to enhance request feature extraction and expedite training, we refine the policy network of MARL and integrate expert demonstrations. Finally, we implement the proposed algorithm within a real edge system. Experimental findings confirm that our VELO framework substantially enhances user satisfaction by concurrently diminishing delay and resource consumption for edge users utilizing LLMs.
引用
收藏
页码:865 / 876
页数:12
相关论文
共 48 条
  • [11] Task partitioning and offloading in IoT cloud-edge collaborative computing framework: a survey
    Chen, Haiming
    Qin, Wei
    Wang, Lei
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2022, 11 (01):
  • [12] A Federated Learning Framework for Cloud-Edge Collaborative Fault Diagnosis of Wind Turbines
    Jiang, Guoqian
    Zhao, Kai
    Liu, Xiufeng
    Cheng, Xu
    Xie, Ping
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (13): : 23170 - 23185
  • [13] A complexity assessment framework with structure entropy for a cloud-edge collaborative manufacturing system
    Li, Jiajian
    Shi, Yanjun
    Sun, Xueyan
    Liu, Dong
    IET COLLABORATIVE INTELLIGENT MANUFACTURING, 2023, 5 (02)
  • [14] Task partitioning and offloading in IoT cloud-edge collaborative computing framework: a survey
    Haiming Chen
    Wei Qin
    Lei Wang
    Journal of Cloud Computing, 11
  • [15] Network Resource Optimization with Latency Sensitivity in Collaborative Cloud-Edge Computing Networks
    Liu, Ling
    Ma, Weike
    Chen, Bowen
    Gao, Mingyi
    Chen, Hong
    Wu, Jinbing
    2020 ASIA COMMUNICATIONS AND PHOTONICS CONFERENCE (ACP) AND INTERNATIONAL CONFERENCE ON INFORMATION PHOTONICS AND OPTICAL COMMUNICATIONS (IPOC), 2020,
  • [16] A task offloading algorithm for cloud-edge collaborative system based on Lyapunov optimization
    Jixun Gao
    Rui Chang
    Zhipeng Yang
    Quanzheng Huang
    Yuanyuan Zhao
    Yu Wu
    Cluster Computing, 2023, 26 : 337 - 348
  • [17] A task offloading algorithm for cloud-edge collaborative system based on Lyapunov optimization
    Gao, Jixun
    Chang, Rui
    Yang, Zhipeng
    Huang, Quanzheng
    Zhao, Yuanyuan
    Wu, Yu
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (01): : 337 - 348
  • [18] Collaborative Cloud-Edge Service Cognition Framework for DNN Configuration Toward Smart IIoT
    Xiao, Wenjing
    Miao, Yiming
    Fortino, Giancarlo
    Wu, Di
    Chen, Min
    Hwang, Kai
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (10) : 7038 - 7047
  • [19] An Offline-Transfer-Online Framework for Cloud-Edge Collaborative Distributed Reinforcement Learning
    Zeng, Tianyu
    Zhang, Xiaoxi
    Duan, Jingpu
    Yu, Chao
    Wu, Chuan
    Chen, Xu
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (05) : 720 - 731
  • [20] Cloud-Edge Collaborative Optimization Algorithm for Fault Restoration of Flexible Interconnected Distribution Network
    Xu Z.
    Liang Y.
    Li H.
    Wang G.
    Li J.
    Dianli Xitong Zidonghua/Automation of Electric Power Systems, 2023, 47 (18): : 171 - 184