CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

被引：1

作者：

Lyu, Yuanjie ^{[1
]}

Li, Zhiyu ^{[2
]}

Niu, Simin ^{[3
]}

Xiong, Feiyu ^{[2
]}

Tang, Bo ^{[2
]}

Wang, Wenjin ^{[2
]}

Wu, Hao ^{[2
]}

Liu, Huanyong ^{[4
]}

Xu, Tong ^{[1
]}

Chen, Enhong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

[2] Inst Adv Algorithms Res Shanghai, Shanghai, Peoples R China

[3] Renmin Univ China, Beijing, Peoples R China

[4] 360 AI Res Inst, Beijing, Peoples R China

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2025年 / 43卷 / 02期

关键词：

D O I：

10.1145/3701228

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Retrieval-augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, evaluating RAG systems is a challenge. Most benchmarks focus primarily on question-answering applications, neglecting other potential scenarios where RAG could be beneficial. Accordingly, in the experiments, these benchmarks often assess only the LLM components of the RAG pipeline or the retriever in knowledge-intensive scenarios, overlooking the impact of external knowledge base construction and the retrieval component on the entire RAG pipeline in non-knowledge-intensive scenarios. To address these issues, this article constructs a large-scale and more comprehensive benchmark and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we refer to the CRUD actions that describe interactions between users and knowledge bases and also categorize the range of RAG applications into four distinct types-create, read, update, and delete (CRUD). "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed different datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, context length, knowledge base construction, and LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. The source code is available at GitHub: https://github.com/IAAR-Shanghai/CRUD_RAG.

引用

页数：32

共 50 条

[11] Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications
Miao, Jing
Thongprayoon, Charat
Suppadungsuk, Supawadee
Valencia, Oscar A. Garcia
Cheungpasitporn, Wisit
MEDICINA-LITHUANIA, 2024, 60 (03):
[12] M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions
Wang, Zheng
Teo, Shu Xian
Ouyang, Jieer
Xu, Yongjun
Shi, Wei
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1966 - 1978
[13] Zero-Shot ECG Diagnosis with Large Language Models and Retrieval-Augmented Generation
Yu, Han
Guo, Peikun
Sano, Akane
MACHINE LEARNING FOR HEALTH, ML4H, VOL 225, 2023, 225 : 650 - 663
[14] KGC-RAG: Knowledge Graph Construction from Large Language Model Using Retrieval-Augmented Generation
Prabhong, Thin
Kertkeidkachorn, Natthawut
Trongratsameethong, Areerat
CEUR Workshop Proceedings, 2024, 3853
[15] Resolving Unseen Rumors with Retrieval-Augmented Large Language Models
Chen, Lei
Wei, Zhongyu
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT IV, NLPCC 2024, 2025, 15362 : 319 - 332
[16] Retrieval-augmented generation versus document-grounded generation: a key distinction in large language models
Hewitt, Katherine J.
Wiest, Isabella C.
Kather, Jakob N.
JOURNAL OF PATHOLOGY CLINICAL RESEARCH, 2025, 11 (01):
[17] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
Wang, Kevin Shukang
Lawrence, Ramon
PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2, 2025, : 1183 - 1189
[18] Facilitating university admission using a chatbot based on large language models with retrieval-augmented generation
Chen, Zheng
Zou, Di
Xie, Haoran
Lou, Huajie
Pang, Zhiyuan
EDUCATIONAL TECHNOLOGY & SOCIETY, 2024, 27 (04): : 454 - 470
[19] Optimized interaction with Large Language Models: A practical guide to Prompt Engineering and Retrieval-Augmented Generation
Fink, Anna
Rau, Alexander
Kotter, Elmar
Bamberg, Fabian
Russe, Maximilian Frederik
RADIOLOGIE, 2025,
[20] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
Wang, Kevin Shukang
Lawrence, Ramon
PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 1, 2025, : 1183 - 1189

← 1 2 3 4 5 →