CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

被引：1

作者：

Lyu, Yuanjie ^{[1
]}

Li, Zhiyu ^{[2
]}

Niu, Simin ^{[3
]}

Xiong, Feiyu ^{[2
]}

Tang, Bo ^{[2
]}

Wang, Wenjin ^{[2
]}

Wu, Hao ^{[2
]}

Liu, Huanyong ^{[4
]}

Xu, Tong ^{[1
]}

Chen, Enhong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

[2] Inst Adv Algorithms Res Shanghai, Shanghai, Peoples R China

[3] Renmin Univ China, Beijing, Peoples R China

[4] 360 AI Res Inst, Beijing, Peoples R China

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2025年 / 43卷 / 02期

关键词：

D O I：

10.1145/3701228

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Retrieval-augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, evaluating RAG systems is a challenge. Most benchmarks focus primarily on question-answering applications, neglecting other potential scenarios where RAG could be beneficial. Accordingly, in the experiments, these benchmarks often assess only the LLM components of the RAG pipeline or the retriever in knowledge-intensive scenarios, overlooking the impact of external knowledge base construction and the retrieval component on the entire RAG pipeline in non-knowledge-intensive scenarios. To address these issues, this article constructs a large-scale and more comprehensive benchmark and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we refer to the CRUD actions that describe interactions between users and knowledge bases and also categorize the range of RAG applications into four distinct types-create, read, update, and delete (CRUD). "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed different datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, context length, knowledge base construction, and LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. The source code is available at GitHub: https://github.com/IAAR-Shanghai/CRUD_RAG.

引用

页数：32

共 50 条

[21] Retrieval-augmented large language models for clinical trial screening.
He, Jianqiao
Gai, Shanglei
Ho, Si Xian
Chua, Shi Ling
Oo, Viviana
Zaw, Ma Wai Wai
Tan, Daniel Shao-Weng
Tan, Ryan
JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (23_SUPPL) : 157 - 157
[22] Advancing Cyber Incident Timeline Analysis Through Retrieval-Augmented Generation and Large Language Models
Loumachi, Fatma Yasmine
Ghanem, Mohamed Chahine
Ferrag, Mohamed Amine
COMPUTERS, 2025, 14 (02)
[23] Retrieval-augmented large language models for clinical trial screening.
Tan, Ryan
Ho, Si Xian
Oo, Shiyun Vivianna Fequira
Chua, Shi Ling
Zaw, Ma Wai Wai
Tan, Daniel Shao-Weng
JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (16)
[24] Utilizing Retrieval-Augmented Large Language Models for Pregnancy Nutrition Advice
Bano, Taranum
Vadapalli, Jagadeesh
Karki, Bishwa
Thoene, Melissa K.
VanOrmer, Matt
Berry, Ann L. Anderson
Tsai, Chun-Hua
NEW TRENDS IN DISRUPTIVE TECHNOLOGIES, TECH ETHICS, AND ARTIFICIAL INTELLIGENCE, DITTET 2024, 2024, 1459 : 85 - 96
[25] Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models
Li, Mingda
Li, Xinyu
Chen, Yifan
Xuan, Wenfeng
Zhang, Weinan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4833 - 4850
[26] Can Small Language Models With Retrieval-Augmented Generation Replace Large Language Models When Learning Computer Science?
Liu, Suqing
Yu, Zezhu
Huang, Feiran
Bulbulia, Yousef
Bergen, Andreas
Liut, Michael
PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024, 2024, : 388 - 393
[27] Leveraging Retrieval-Augmented Generation for Reliable Medical Question Answering Using Large Language Models
Kharitonova, Ksenia
Perez-Fernandez, David
Gutierrez-Hernando, Javier
Gutierrez-Fandino, Asier
Callejas, Zoraida
Griol, David
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, HAIS 2024, 2025, 14858 : 141 - 153
[28] Layered Query Retrieval: An Adaptive Framework for Retrieval-Augmented Generation in Complex Question Answering for Large Language Models
Huang, Jie
Wang, Mo
Cui, Yunpeng
Liu, Juan
Chen, Li
Wang, Ting
Li, Huan
Wu, Jinming
APPLIED SCIENCES-BASEL, 2024, 14 (23):
[29] In-Context Retrieval-Augmented Language Models
Ram, Ori
Levine, Yoav
Dalmedigos, Itay
Muhlgay, Dor
Shashua, Amnon
Leyton-Brown, Kevin
Shoham, Yoav
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1316 - 1331
[30] The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)
Zeng, Shenglai
Zhang, Jiankun
He, Pengfei
Xing, Yue
Liu, Yiding
Xu, Han
Ren, Jie
Wang, Shuaiqiang
Yin, Dawei
Chang, Yi
Tang, Jiliang
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4505 - 4524

← 1 2 3 4 5 →