Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models

被引：0

作者：

Jiang, Wenqi ^{[1
]}

Zeller, Marco ^{[1
]}

Waleffe, Roger ^{[2
]}

Hoefler, Torsten ^{[3
]}

Alonso, Gustavo ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Syst Grp, Zurich, Switzerland

[2] Univ Wisconsin Madison, Madison, WI USA

[3] Swiss Fed Inst Technol, SPCL, Zurich, Switzerland

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 18卷 / 01期

关键词：

NEAREST-NEIGHBOR SEARCH; SIMILARITY SEARCH; QUANTIZATION; ENGINE; VECTOR;

D O I：

10.14778/3696435.3696439

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A Retrieval-Augmented Language Model (RALM) combines a large language model (LLM) with a vector database to retrieve context- specific knowledge during text generation. This strategy facilitates impressive generation quality even with smaller models, thus reducing computational demands by orders of magnitude. To serve RALMs efficiently and flexibly, we propose Chameleon, a heterogeneous accelerator system integrating both LLM and vector search accelerators in a disaggregated architecture. The heterogeneity ensures efficient serving for both inference and retrieval, while the disaggregation allows independent scaling of LLM and vector search accelerators to fulfill diverse RALM requirements. Our Chameleon prototype implements vector search accelerators on FPGAs and assigns LLM inference to GPUs, with CPUs as cluster coordinators. Evaluated on various RALMs, Chameleon exhibits up to 2.16x reduction in latency and 3.18x speedup in throughput compared to the hybrid CPU-GPU architecture. The promising results pave the way for adopting heterogeneous accelerators for not only LLM inference but also vector search in future RALM systems.

引用

页码：42 / 52

页数：11

共 50 条

[31] Can Small Language Models With Retrieval-Augmented Generation Replace Large Language Models When Learning Computer Science?
Liu, Suqing
Yu, Zezhu
Huang, Feiran
Bulbulia, Yousef
Bergen, Andreas
Liut, Michael
PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024, 2024, : 388 - 393
[32] Building a Coding Assistant via the Retrieval-Augmented Language Model
Li, Xinze
Wang, Hanbin
Liu, Zhenghao
Yu, Shi
Wang, Shuo
Yan, Yukun
Fu, Yukai
Gu, Yu
Yu, Ge
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
[33] Learning Customized Visual Models with Retrieval-Augmented Knowledge
Liu, Haotian
Son, Kilho
Yang, Jianwei
Liu, Ce
Gao, Jianfeng
Lee, Yong Jae
Li, Chunyuan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15148 - 15158
[34] REALM: Retrieval-Augmented Language Model Pre-Training
Guu, Kelvin
Lee, Kenton
Tung, Zora
Pasupat, Panupong
Chang, Ming-Wei
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[35] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
Wang, Kevin Shukang
Lawrence, Ramon
PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2, 2025, : 1183 - 1189
[36] Layered Query Retrieval: An Adaptive Framework for Retrieval-Augmented Generation in Complex Question Answering for Large Language Models
Huang, Jie
Wang, Mo
Cui, Yunpeng
Liu, Juan
Chen, Li
Wang, Ting
Li, Huan
Wu, Jinming
APPLIED SCIENCES-BASEL, 2024, 14 (23):
[37] Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models
Louis, Antoine
van Dijck, Gijs
Spanakis, Gerasimos
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22266 - 22275
[38] Facilitating university admission using a chatbot based on large language models with retrieval-augmented generation
Chen, Zheng
Zou, Di
Xie, Haoran
Lou, Huajie
Pang, Zhiyuan
EDUCATIONAL TECHNOLOGY & SOCIETY, 2024, 27 (04): : 454 - 470
[39] DEVELOPMENT OF A RETRIEVAL-AUGMENTED GENERATION PIPELINE LEVERAGING LARGE LANGUAGE MODELS TO SUPPORT EVIDENCE SYNTHESIS
Perera, C.
Heron, L.
Hirst, A.
VALUE IN HEALTH, 2024, 27 (12)
[40] CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
Lyu, Yuanjie
Li, Zhiyu
Niu, Simin
Xiong, Feiyu
Tang, Bo
Wang, Wenjin
Wu, Hao
Liu, Huanyong
Xu, Tong
Chen, Enhong
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)

← 1 2 3 4 5 →