CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

被引:1
|
作者
Lyu, Yuanjie [1 ]
Li, Zhiyu [2 ]
Niu, Simin [3 ]
Xiong, Feiyu [2 ]
Tang, Bo [2 ]
Wang, Wenjin [2 ]
Wu, Hao [2 ]
Liu, Huanyong [4 ]
Xu, Tong [1 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Inst Adv Algorithms Res Shanghai, Shanghai, Peoples R China
[3] Renmin Univ China, Beijing, Peoples R China
[4] 360 AI Res Inst, Beijing, Peoples R China
关键词
D O I
10.1145/3701228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retrieval-augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, evaluating RAG systems is a challenge. Most benchmarks focus primarily on question-answering applications, neglecting other potential scenarios where RAG could be beneficial. Accordingly, in the experiments, these benchmarks often assess only the LLM components of the RAG pipeline or the retriever in knowledge-intensive scenarios, overlooking the impact of external knowledge base construction and the retrieval component on the entire RAG pipeline in non-knowledge-intensive scenarios. To address these issues, this article constructs a large-scale and more comprehensive benchmark and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we refer to the CRUD actions that describe interactions between users and knowledge bases and also categorize the range of RAG applications into four distinct types-create, read, update, and delete (CRUD). "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed different datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, context length, knowledge base construction, and LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. The source code is available at GitHub: https://github.com/IAAR-Shanghai/CRUD_RAG.
引用
收藏
页数:32
相关论文
共 50 条
  • [11] Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications
    Miao, Jing
    Thongprayoon, Charat
    Suppadungsuk, Supawadee
    Valencia, Oscar A. Garcia
    Cheungpasitporn, Wisit
    MEDICINA-LITHUANIA, 2024, 60 (03):
  • [12] M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions
    Wang, Zheng
    Teo, Shu Xian
    Ouyang, Jieer
    Xu, Yongjun
    Shi, Wei
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1966 - 1978
  • [13] Zero-Shot ECG Diagnosis with Large Language Models and Retrieval-Augmented Generation
    Yu, Han
    Guo, Peikun
    Sano, Akane
    MACHINE LEARNING FOR HEALTH, ML4H, VOL 225, 2023, 225 : 650 - 663
  • [14] KGC-RAG: Knowledge Graph Construction from Large Language Model Using Retrieval-Augmented Generation
    Prabhong, Thin
    Kertkeidkachorn, Natthawut
    Trongratsameethong, Areerat
    CEUR Workshop Proceedings, 2024, 3853
  • [15] Resolving Unseen Rumors with Retrieval-Augmented Large Language Models
    Chen, Lei
    Wei, Zhongyu
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT IV, NLPCC 2024, 2025, 15362 : 319 - 332
  • [16] Retrieval-augmented generation versus document-grounded generation: a key distinction in large language models
    Hewitt, Katherine J.
    Wiest, Isabella C.
    Kather, Jakob N.
    JOURNAL OF PATHOLOGY CLINICAL RESEARCH, 2025, 11 (01):
  • [17] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
    Wang, Kevin Shukang
    Lawrence, Ramon
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2, 2025, : 1183 - 1189
  • [18] Facilitating university admission using a chatbot based on large language models with retrieval-augmented generation
    Chen, Zheng
    Zou, Di
    Xie, Haoran
    Lou, Huajie
    Pang, Zhiyuan
    EDUCATIONAL TECHNOLOGY & SOCIETY, 2024, 27 (04): : 454 - 470
  • [19] Optimized interaction with Large Language Models: A practical guide to Prompt Engineering and Retrieval-Augmented Generation
    Fink, Anna
    Rau, Alexander
    Kotter, Elmar
    Bamberg, Fabian
    Russe, Maximilian Frederik
    RADIOLOGIE, 2025,
  • [20] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
    Wang, Kevin Shukang
    Lawrence, Ramon
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 1, 2025, : 1183 - 1189