CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

被引：1

作者：

Lyu, Yuanjie ^{[1
]}

Li, Zhiyu ^{[2
]}

Niu, Simin ^{[3
]}

Xiong, Feiyu ^{[2
]}

Tang, Bo ^{[2
]}

Wang, Wenjin ^{[2
]}

Wu, Hao ^{[2
]}

Liu, Huanyong ^{[4
]}

Xu, Tong ^{[1
]}

Chen, Enhong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

[2] Inst Adv Algorithms Res Shanghai, Shanghai, Peoples R China

[3] Renmin Univ China, Beijing, Peoples R China

[4] 360 AI Res Inst, Beijing, Peoples R China

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2025年 / 43卷 / 02期

关键词：

D O I：

10.1145/3701228

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Retrieval-augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, evaluating RAG systems is a challenge. Most benchmarks focus primarily on question-answering applications, neglecting other potential scenarios where RAG could be beneficial. Accordingly, in the experiments, these benchmarks often assess only the LLM components of the RAG pipeline or the retriever in knowledge-intensive scenarios, overlooking the impact of external knowledge base construction and the retrieval component on the entire RAG pipeline in non-knowledge-intensive scenarios. To address these issues, this article constructs a large-scale and more comprehensive benchmark and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we refer to the CRUD actions that describe interactions between users and knowledge bases and also categorize the range of RAG applications into four distinct types-create, read, update, and delete (CRUD). "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed different datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, context length, knowledge base construction, and LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. The source code is available at GitHub: https://github.com/IAAR-Shanghai/CRUD_RAG.

引用

页数：32

共 50 条

[41] Application of NotebookLM, a large language model with retrieval-augmented generation, for lung cancer staging
Tozuka, Ryota
Johno, Hisashi
Amakawa, Akitomo
Sato, Junichi
Muto, Mizuki
Seki, Shoichiro
Komaba, Atsushi
Onishi, Hiroshi
JAPANESE JOURNAL OF RADIOLOGY, 2024, : 706 - 712
[42] Optimizing High-Level Synthesis Designs with Retrieval-Augmented Large Language Models
Xu, Haocheng
Hu, Haotian
Huang, Sitao
2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
[43] Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems
Ndimbo, Edmund V.
Luo, Qin
Fernando, Gimo C.
Yang, Xu
Wang, Bang
APPLIED SCIENCES-BASEL, 2025, 15 (02):
[44] Multimodal retrieval-augmented generation for financial documents: image-centric analysis of charts and tables with large language models
Jiang, Cheng
Zhang, Pengle
Ni, Ying
Wang, Xiaoli
Peng, Hanghang
Liu, Sen
Fei, Mengdi
He, Yuxin
Xiao, Yaxuan
Huang, Jin
Ma, Xingyu
Yang, Tian
VISUAL COMPUTER, 2025,
[45] Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models
Jeong, Minbyul
Sohn, Jiwoong
Sung, Mujeen
Kang, Jaewoo
BIOINFORMATICS, 2024, 40 : i119 - i129
[46] Enhanced Recommendation Systems with Retrieval-Augmented Large Language Model
Wei, Chuyuan
Duan, Ke
Zhuo, Shengda
Wang, Hongchun
Huang, Shuqiang
Liu, Jie
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2025, 82 : 1147 - 1173
[47] A Retrieval-Augmented Framework for Tabular Interpretation with Large Language Model
Yan, Mengyi
Rene, Weilong
Wang, Yaoshu
Li, Jianxin
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2024, PT 2, 2025, 14851 : 341 - 356
[48] Application of retrieval-augmented generation for interactive industrial knowledge management via a large language model
Chen, Lun-Chi
Pardeshi, Mayuresh Sunil
Liao, Yi-Xiang
Pai, Kai-Chih
COMPUTER STANDARDS & INTERFACES, 2025, 94
[49] Enhancing Environmental Control in Broiler Production: Retrieval-Augmented Generation for Improved Decision-Making with Large Language Models
Leite, Marcus Vinicius
Abe, Jair Minoro
Souza, Marcos Leandro Hoffmann
Naas, Irenilza de Alencar
AGRIENGINEERING, 2025, 7 (01):
[50] An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models
Wang, Mengzhao
Wu, Haotian
Ke, Xiangyu
Gao, Yunjun
Xu, Xiaoliang
Chen, Lu
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4333 - 4336

← 1 2 3 4 5 →