CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

被引：1

作者：

Lyu, Yuanjie ^{[1
]}

Li, Zhiyu ^{[2
]}

Niu, Simin ^{[3
]}

Xiong, Feiyu ^{[2
]}

Tang, Bo ^{[2
]}

Wang, Wenjin ^{[2
]}

Wu, Hao ^{[2
]}

Liu, Huanyong ^{[4
]}

Xu, Tong ^{[1
]}

Chen, Enhong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

[2] Inst Adv Algorithms Res Shanghai, Shanghai, Peoples R China

[3] Renmin Univ China, Beijing, Peoples R China

[4] 360 AI Res Inst, Beijing, Peoples R China

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2025年 / 43卷 / 02期

关键词：

D O I：

10.1145/3701228

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Retrieval-augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, evaluating RAG systems is a challenge. Most benchmarks focus primarily on question-answering applications, neglecting other potential scenarios where RAG could be beneficial. Accordingly, in the experiments, these benchmarks often assess only the LLM components of the RAG pipeline or the retriever in knowledge-intensive scenarios, overlooking the impact of external knowledge base construction and the retrieval component on the entire RAG pipeline in non-knowledge-intensive scenarios. To address these issues, this article constructs a large-scale and more comprehensive benchmark and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we refer to the CRUD actions that describe interactions between users and knowledge bases and also categorize the range of RAG applications into four distinct types-create, read, update, and delete (CRUD). "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed different datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, context length, knowledge base construction, and LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. The source code is available at GitHub: https://github.com/IAAR-Shanghai/CRUD_RAG.

引用

页数：32

共 50 条

[31] Enhancement of the Performance of Large Language Models inDiabetes Education through Retrieval-Augmented Generation:Comparative Study
Wang, Dingqiao
Liang, Jiangbo
Ye, Jinguo
Li, Jingni
Li, Jingpeng
Zhang, Qikai
Hu, Qiuling
Pan, Caineng
Wang, Dongliang
Liu, Zhong
Shi, Wen
Shi, Danli
Li, Fei
Qu, Bo
Zheng, Yingfeng
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[32] Retrieval-Augmented Generation for Large Language Models in Radiology: Another Leap Forward in Board Examination Performance
Bhayana, Rajesh
Fawzy, Aly
Deng, Yangqing
Bleakney, Robert R.
Krishna, Satheesh
RADIOLOGY, 2024, 313 (01)
[33] Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models
Kim, Gangwoo
Kim, Sungdong
Jeon, Byeongguk
Park, Joonsuk
Kang, Jaewoo
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 996 - 1009
[34] Retrieval-augmented Recommender System: Enhancing Recommender Systems with Large Language Models
Di Palma, Dario
PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1369 - 1373
[35] A Comprehensive Survey of Retrieval-Augmented Large Language Models for Decision Making in Agriculture: Unsolved Problems and Research Opportunities
Vizniuk, Artem
Diachenko, Grygorii
Laktionov, Ivan
Siwocha, Agnieszka
Xiao, Min
Smolag, Jacek
JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2025, 15 (02) : 115 - 146
[36] Performance comparison of retrieval-augmented generation and fine-tuned large language models for construction safety management knowledge retrieval
Lee, Jungwon
Ahn, Seungjun
Kim, Daeho
Kim, Dongkyun
AUTOMATION IN CONSTRUCTION, 2024, 168
[37] RA-CFGPT: Chinese financial assistant with retrieval-augmented large language model
Li, Jiangtong
Lei, Yang
Bian, Yuxuan
Cheng, Dawei
Ding, Zhijun
Jiang, Changjun
FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (05)
[38] Enhancing the Precision and Interpretability of Retrieval-Augmented Generation (RAG) in Legal Technology: A Survey
Hindi, Mahd
Mohammed, Linda
Maaz, Ommama
Alwarafy, Abdulmalik
IEEE ACCESS, 2025, 13 : 46171 - 46189
[39] LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Yang, Kaiyu
Swope, Aidan M.
Gu, Alex
Chalamala, Rahul
Song, Peiyang
Yu, Shixing
Godil, Saad
Prenger, Ryan
Anandkumar, Anima
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[40] Retrieval-Augmented Generation Approach: Document Question Answering using Large Language Model
Muludi, Kurnia
Fitria, Kaira Milani
Triloka, Joko
Sutedi
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (03) : 776 - 785

← 1 2 3 4 5 →