Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models

被引:0
|
作者
Jiang, Wenqi [1 ]
Zeller, Marco [1 ]
Waleffe, Roger [2 ]
Hoefler, Torsten [3 ]
Alonso, Gustavo [1 ]
机构
[1] Swiss Fed Inst Technol, Syst Grp, Zurich, Switzerland
[2] Univ Wisconsin Madison, Madison, WI USA
[3] Swiss Fed Inst Technol, SPCL, Zurich, Switzerland
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 18卷 / 01期
关键词
NEAREST-NEIGHBOR SEARCH; SIMILARITY SEARCH; QUANTIZATION; ENGINE; VECTOR;
D O I
10.14778/3696435.3696439
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A Retrieval-Augmented Language Model (RALM) combines a large language model (LLM) with a vector database to retrieve context- specific knowledge during text generation. This strategy facilitates impressive generation quality even with smaller models, thus reducing computational demands by orders of magnitude. To serve RALMs efficiently and flexibly, we propose Chameleon, a heterogeneous accelerator system integrating both LLM and vector search accelerators in a disaggregated architecture. The heterogeneity ensures efficient serving for both inference and retrieval, while the disaggregation allows independent scaling of LLM and vector search accelerators to fulfill diverse RALM requirements. Our Chameleon prototype implements vector search accelerators on FPGAs and assigns LLM inference to GPUs, with CPUs as cluster coordinators. Evaluated on various RALMs, Chameleon exhibits up to 2.16x reduction in latency and 3.18x speedup in throughput compared to the hybrid CPU-GPU architecture. The promising results pave the way for adopting heterogeneous accelerators for not only LLM inference but also vector search in future RALM systems.
引用
收藏
页码:42 / 52
页数:11
相关论文
共 50 条
  • [21] Emergency Department Admitting Service Triage Using Retrieval-Augmented Language Models
    Yao, D-h
    Badhwar, A.
    Li, W.
    Hall, N.
    Semma, Y.
    Dash, D.
    ANNALS OF EMERGENCY MEDICINE, 2024, 84 (04) : S63 - S63
  • [22] ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models
    Zhang, Jianyi
    Muhamed, Aashiq
    Anantharaman, Aditya
    Wang, Guoyin
    Chen, Changyou
    Zhong, Kai
    Cui, Qingjun
    Xu, Yi
    Zeng, Belinda
    Chilimbi, Trishul
    Chen, Yiran
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1128 - 1136
  • [23] TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
    Shanghai Jiao Tong University, China
    arXiv,
  • [24] Optimizing High-Level Synthesis Designs with Retrieval-Augmented Large Language Models
    Xu, Haocheng
    Hu, Haotian
    Huang, Sitao
    2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
  • [25] Adaptive Control of Retrieval-Augmented Generation for Large Language Models Through Reflective Tags
    Yao, Chengyuan
    Fujita, Satoshi
    ELECTRONICS, 2024, 13 (23):
  • [26] Towards an FA ChatBot with Retrieval-augmented Language Modeling
    Fichtenkamm, Maik
    Kofler, Markus
    Schekotihin, Konstantin
    Burmer, Christian
    2024 IEEE INTERNATIONAL SYMPOSIUM ON THE PHYSICAL AND FAILURE ANALYSIS OF INTEGRATED CIRCUITS, IPFA 2024, 2024,
  • [27] Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models
    Jeong, Minbyul
    Sohn, Jiwoong
    Sung, Mujeen
    Kang, Jaewoo
    BIOINFORMATICS, 2024, 40 : i119 - i129
  • [28] Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications
    Miao, Jing
    Thongprayoon, Charat
    Suppadungsuk, Supawadee
    Valencia, Oscar A. Garcia
    Cheungpasitporn, Wisit
    MEDICINA-LITHUANIA, 2024, 60 (03):
  • [29] Zero-Shot ECG Diagnosis with Large Language Models and Retrieval-Augmented Generation
    Yu, Han
    Guo, Peikun
    Sano, Akane
    MACHINE LEARNING FOR HEALTH, ML4H, VOL 225, 2023, 225 : 650 - 663
  • [30] GOODTRIEVER: Adaptive Toxicity Mitigation with Retrieval-augmented Models
    Pozzobon, Luiza
    Ermis, Beyza
    Lewis, Patrick
    Hooker, Sara
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5108 - 5125