CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

被引:1
|
作者
Lyu, Yuanjie [1 ]
Li, Zhiyu [2 ]
Niu, Simin [3 ]
Xiong, Feiyu [2 ]
Tang, Bo [2 ]
Wang, Wenjin [2 ]
Wu, Hao [2 ]
Liu, Huanyong [4 ]
Xu, Tong [1 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Inst Adv Algorithms Res Shanghai, Shanghai, Peoples R China
[3] Renmin Univ China, Beijing, Peoples R China
[4] 360 AI Res Inst, Beijing, Peoples R China
关键词
D O I
10.1145/3701228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retrieval-augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, evaluating RAG systems is a challenge. Most benchmarks focus primarily on question-answering applications, neglecting other potential scenarios where RAG could be beneficial. Accordingly, in the experiments, these benchmarks often assess only the LLM components of the RAG pipeline or the retriever in knowledge-intensive scenarios, overlooking the impact of external knowledge base construction and the retrieval component on the entire RAG pipeline in non-knowledge-intensive scenarios. To address these issues, this article constructs a large-scale and more comprehensive benchmark and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we refer to the CRUD actions that describe interactions between users and knowledge bases and also categorize the range of RAG applications into four distinct types-create, read, update, and delete (CRUD). "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed different datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, context length, knowledge base construction, and LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. The source code is available at GitHub: https://github.com/IAAR-Shanghai/CRUD_RAG.
引用
收藏
页数:32
相关论文
共 50 条
  • [1] Benchmarking Large Language Models in Retrieval-Augmented Generation
    Chen, Jiawei
    Lin, Hongyu
    Han, Xianpei
    Sun, Le
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17754 - 17762
  • [2] OG-RAG: ONTOLOGY-GROUNDED RETRIEVAL-AUGMENTED GENERATION FOR LARGE LANGUAGE MODELS
    Sharma, Kartik
    Kumar, Peeyush
    Li, Yunqing
    arXiv,
  • [3] Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
    Shaol, Zhihong
    Gong, Yeyun
    Shen, Yelong
    Huang, Minlie
    Duane, Nan
    Chen, Weizhu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9248 - 9274
  • [4] Integrating Graph Retrieval-Augmented Generation With Large Language Models for Supplier Discovery
    Li, Yunqing
    Ko, Hyunwoong
    Ameri, Farhad
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2025, 25 (02)
  • [5] Query Rewriting for Retrieval-Augmented Large Language Models
    Ma, Xinbei
    Gong, Yeyun
    He, Pengcheng
    Zhao, Hai
    Duan, Nan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
  • [6] The Journey to A Knowledgeable Assistant with Retrieval-Augmented Generation (RAG)
    Dong, Xin Luna
    COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 3 - 3
  • [7] TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
    Shanghai Jiao Tong University, China
    arXiv,
  • [8] The Journey to A Knowledgeable Assistant with Retrieval-Augmented Generation (RAG)
    Dong, Xin Luna
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 4 - 4
  • [9] Adaptive Control of Retrieval-Augmented Generation for Large Language Models Through Reflective Tags
    Yao, Chengyuan
    Fujita, Satoshi
    ELECTRONICS, 2024, 13 (23):
  • [10] Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review
    Zhang, Wan
    Zhang, Jing
    MATHEMATICS, 2025, 13 (05)