Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models

被引:0
|
作者
Jiang, Wenqi [1 ]
Zeller, Marco [1 ]
Waleffe, Roger [2 ]
Hoefler, Torsten [3 ]
Alonso, Gustavo [1 ]
机构
[1] Swiss Fed Inst Technol, Syst Grp, Zurich, Switzerland
[2] Univ Wisconsin Madison, Madison, WI USA
[3] Swiss Fed Inst Technol, SPCL, Zurich, Switzerland
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 18卷 / 01期
关键词
NEAREST-NEIGHBOR SEARCH; SIMILARITY SEARCH; QUANTIZATION; ENGINE; VECTOR;
D O I
10.14778/3696435.3696439
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A Retrieval-Augmented Language Model (RALM) combines a large language model (LLM) with a vector database to retrieve context- specific knowledge during text generation. This strategy facilitates impressive generation quality even with smaller models, thus reducing computational demands by orders of magnitude. To serve RALMs efficiently and flexibly, we propose Chameleon, a heterogeneous accelerator system integrating both LLM and vector search accelerators in a disaggregated architecture. The heterogeneity ensures efficient serving for both inference and retrieval, while the disaggregation allows independent scaling of LLM and vector search accelerators to fulfill diverse RALM requirements. Our Chameleon prototype implements vector search accelerators on FPGAs and assigns LLM inference to GPUs, with CPUs as cluster coordinators. Evaluated on various RALMs, Chameleon exhibits up to 2.16x reduction in latency and 3.18x speedup in throughput compared to the hybrid CPU-GPU architecture. The promising results pave the way for adopting heterogeneous accelerators for not only LLM inference but also vector search in future RALM systems.
引用
收藏
页码:42 / 52
页数:11
相关论文
共 50 条
  • [1] In-Context Retrieval-Augmented Language Models
    Ram, Ori
    Levine, Yoav
    Dalmedigos, Itay
    Muhlgay, Dor
    Shashua, Amnon
    Leyton-Brown, Kevin
    Shoham, Yoav
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1316 - 1331
  • [2] Query Rewriting for Retrieval-Augmented Large Language Models
    Ma, Xinbei
    Gong, Yeyun
    He, Pengcheng
    Zhao, Hai
    Duan, Nan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
  • [3] Benchmarking Large Language Models in Retrieval-Augmented Generation
    Chen, Jiawei
    Lin, Hongyu
    Han, Xianpei
    Sun, Le
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17754 - 17762
  • [4] LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
    Yang, Kaiyu
    Swope, Aidan M.
    Gu, Alex
    Chalamala, Rahul
    Song, Peiyang
    Yu, Shixing
    Godil, Saad
    Prenger, Ryan
    Anandkumar, Anima
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Retrieval-augmented Recommender System: Enhancing Recommender Systems with Large Language Models
    Di Palma, Dario
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1369 - 1373
  • [6] Retrieval-Augmented Diffusion Models
    Blattmann, Andreas
    Rombach, Robin
    Oktay, Kaan
    Mueller, Jonas
    Ommer, Bjoern
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review
    Zhang, Wan
    Zhang, Jing
    MATHEMATICS, 2025, 13 (05)
  • [8] Resolving Unseen Rumors with Retrieval-Augmented Large Language Models
    Chen, Lei
    Wei, Zhongyu
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT IV, NLPCC 2024, 2025, 15362 : 319 - 332
  • [9] Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models
    Doostmohammadi, Ehsan
    Norlund, Tobias
    Kuhlmann, Marco
    Johansson, Richard
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 521 - 529
  • [10] Retrieval-augmented large language models for clinical trial screening.
    He, Jianqiao
    Gai, Shanglei
    Ho, Si Xian
    Chua, Shi Ling
    Oo, Viviana
    Zaw, Ma Wai Wai
    Tan, Daniel Shao-Weng
    Tan, Ryan
    JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (23_SUPPL) : 157 - 157