CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

被引：1

作者：

Lyu, Yuanjie ^{[1
]}

Li, Zhiyu ^{[2
]}

Niu, Simin ^{[3
]}

Xiong, Feiyu ^{[2
]}

Tang, Bo ^{[2
]}

Wang, Wenjin ^{[2
]}

Wu, Hao ^{[2
]}

Liu, Huanyong ^{[4
]}

Xu, Tong ^{[1
]}

Chen, Enhong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

[2] Inst Adv Algorithms Res Shanghai, Shanghai, Peoples R China

[3] Renmin Univ China, Beijing, Peoples R China

[4] 360 AI Res Inst, Beijing, Peoples R China

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2025年 / 43卷 / 02期

关键词：

D O I：

10.1145/3701228

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Retrieval-augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, evaluating RAG systems is a challenge. Most benchmarks focus primarily on question-answering applications, neglecting other potential scenarios where RAG could be beneficial. Accordingly, in the experiments, these benchmarks often assess only the LLM components of the RAG pipeline or the retriever in knowledge-intensive scenarios, overlooking the impact of external knowledge base construction and the retrieval component on the entire RAG pipeline in non-knowledge-intensive scenarios. To address these issues, this article constructs a large-scale and more comprehensive benchmark and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we refer to the CRUD actions that describe interactions between users and knowledge bases and also categorize the range of RAG applications into four distinct types-create, read, update, and delete (CRUD). "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed different datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, context length, knowledge base construction, and LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. The source code is available at GitHub: https://github.com/IAAR-Shanghai/CRUD_RAG.

引用

页数：32

共 50 条

[1] Benchmarking Large Language Models in Retrieval-Augmented Generation
Chen, Jiawei
Lin, Hongyu
Han, Xianpei
Sun, Le
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17754 - 17762
[2] OG-RAG: ONTOLOGY-GROUNDED RETRIEVAL-AUGMENTED GENERATION FOR LARGE LANGUAGE MODELS
Sharma, Kartik
Kumar, Peeyush
Li, Yunqing
arXiv,
[3] Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
Shaol, Zhihong
Gong, Yeyun
Shen, Yelong
Huang, Minlie
Duane, Nan
Chen, Weizhu
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9248 - 9274
[4] Integrating Graph Retrieval-Augmented Generation With Large Language Models for Supplier Discovery
Li, Yunqing
Ko, Hyunwoong
Ameri, Farhad
JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2025, 25 (02)
[5] Query Rewriting for Retrieval-Augmented Large Language Models
Ma, Xinbei
Gong, Yeyun
He, Pengcheng
Zhao, Hai
Duan, Nan
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
[6] The Journey to A Knowledgeable Assistant with Retrieval-Augmented Generation (RAG)
Dong, Xin Luna
COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 3 - 3
[7] TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
Shanghai Jiao Tong University, China
arXiv,
[8] The Journey to A Knowledgeable Assistant with Retrieval-Augmented Generation (RAG)
Dong, Xin Luna
PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 4 - 4
[9] Adaptive Control of Retrieval-Augmented Generation for Large Language Models Through Reflective Tags
Yao, Chengyuan
Fujita, Satoshi
ELECTRONICS, 2024, 13 (23):
[10] Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review
Zhang, Wan
Zhang, Jing
MATHEMATICS, 2025, 13 (05)

← 1 2 3 4 5 →