Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models

被引:0
|
作者
Jiang, Wenqi [1 ]
Zeller, Marco [1 ]
Waleffe, Roger [2 ]
Hoefler, Torsten [3 ]
Alonso, Gustavo [1 ]
机构
[1] Swiss Fed Inst Technol, Syst Grp, Zurich, Switzerland
[2] Univ Wisconsin Madison, Madison, WI USA
[3] Swiss Fed Inst Technol, SPCL, Zurich, Switzerland
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 18卷 / 01期
关键词
NEAREST-NEIGHBOR SEARCH; SIMILARITY SEARCH; QUANTIZATION; ENGINE; VECTOR;
D O I
10.14778/3696435.3696439
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A Retrieval-Augmented Language Model (RALM) combines a large language model (LLM) with a vector database to retrieve context- specific knowledge during text generation. This strategy facilitates impressive generation quality even with smaller models, thus reducing computational demands by orders of magnitude. To serve RALMs efficiently and flexibly, we propose Chameleon, a heterogeneous accelerator system integrating both LLM and vector search accelerators in a disaggregated architecture. The heterogeneity ensures efficient serving for both inference and retrieval, while the disaggregation allows independent scaling of LLM and vector search accelerators to fulfill diverse RALM requirements. Our Chameleon prototype implements vector search accelerators on FPGAs and assigns LLM inference to GPUs, with CPUs as cluster coordinators. Evaluated on various RALMs, Chameleon exhibits up to 2.16x reduction in latency and 3.18x speedup in throughput compared to the hybrid CPU-GPU architecture. The promising results pave the way for adopting heterogeneous accelerators for not only LLM inference but also vector search in future RALM systems.
引用
收藏
页码:42 / 52
页数:11
相关论文
共 50 条
  • [31] Can Small Language Models With Retrieval-Augmented Generation Replace Large Language Models When Learning Computer Science?
    Liu, Suqing
    Yu, Zezhu
    Huang, Feiran
    Bulbulia, Yousef
    Bergen, Andreas
    Liut, Michael
    PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024, 2024, : 388 - 393
  • [32] Building a Coding Assistant via the Retrieval-Augmented Language Model
    Li, Xinze
    Wang, Hanbin
    Liu, Zhenghao
    Yu, Shi
    Wang, Shuo
    Yan, Yukun
    Fu, Yukai
    Gu, Yu
    Yu, Ge
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
  • [33] Learning Customized Visual Models with Retrieval-Augmented Knowledge
    Liu, Haotian
    Son, Kilho
    Yang, Jianwei
    Liu, Ce
    Gao, Jianfeng
    Lee, Yong Jae
    Li, Chunyuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15148 - 15158
  • [34] REALM: Retrieval-Augmented Language Model Pre-Training
    Guu, Kelvin
    Lee, Kenton
    Tung, Zora
    Pasupat, Panupong
    Chang, Ming-Wei
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [35] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
    Wang, Kevin Shukang
    Lawrence, Ramon
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2, 2025, : 1183 - 1189
  • [36] Layered Query Retrieval: An Adaptive Framework for Retrieval-Augmented Generation in Complex Question Answering for Large Language Models
    Huang, Jie
    Wang, Mo
    Cui, Yunpeng
    Liu, Juan
    Chen, Li
    Wang, Ting
    Li, Huan
    Wu, Jinming
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [37] Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models
    Louis, Antoine
    van Dijck, Gijs
    Spanakis, Gerasimos
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22266 - 22275
  • [38] Facilitating university admission using a chatbot based on large language models with retrieval-augmented generation
    Chen, Zheng
    Zou, Di
    Xie, Haoran
    Lou, Huajie
    Pang, Zhiyuan
    EDUCATIONAL TECHNOLOGY & SOCIETY, 2024, 27 (04): : 454 - 470
  • [39] DEVELOPMENT OF A RETRIEVAL-AUGMENTED GENERATION PIPELINE LEVERAGING LARGE LANGUAGE MODELS TO SUPPORT EVIDENCE SYNTHESIS
    Perera, C.
    Heron, L.
    Hirst, A.
    VALUE IN HEALTH, 2024, 27 (12)
  • [40] CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
    Lyu, Yuanjie
    Li, Zhiyu
    Niu, Simin
    Xiong, Feiyu
    Tang, Bo
    Wang, Wenjin
    Wu, Hao
    Liu, Huanyong
    Xu, Tong
    Chen, Enhong
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)