CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

被引:1
|
作者
Lyu, Yuanjie [1 ]
Li, Zhiyu [2 ]
Niu, Simin [3 ]
Xiong, Feiyu [2 ]
Tang, Bo [2 ]
Wang, Wenjin [2 ]
Wu, Hao [2 ]
Liu, Huanyong [4 ]
Xu, Tong [1 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Inst Adv Algorithms Res Shanghai, Shanghai, Peoples R China
[3] Renmin Univ China, Beijing, Peoples R China
[4] 360 AI Res Inst, Beijing, Peoples R China
关键词
D O I
10.1145/3701228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retrieval-augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, evaluating RAG systems is a challenge. Most benchmarks focus primarily on question-answering applications, neglecting other potential scenarios where RAG could be beneficial. Accordingly, in the experiments, these benchmarks often assess only the LLM components of the RAG pipeline or the retriever in knowledge-intensive scenarios, overlooking the impact of external knowledge base construction and the retrieval component on the entire RAG pipeline in non-knowledge-intensive scenarios. To address these issues, this article constructs a large-scale and more comprehensive benchmark and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we refer to the CRUD actions that describe interactions between users and knowledge bases and also categorize the range of RAG applications into four distinct types-create, read, update, and delete (CRUD). "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed different datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, context length, knowledge base construction, and LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. The source code is available at GitHub: https://github.com/IAAR-Shanghai/CRUD_RAG.
引用
收藏
页数:32
相关论文
共 50 条
  • [41] Application of NotebookLM, a large language model with retrieval-augmented generation, for lung cancer staging
    Tozuka, Ryota
    Johno, Hisashi
    Amakawa, Akitomo
    Sato, Junichi
    Muto, Mizuki
    Seki, Shoichiro
    Komaba, Atsushi
    Onishi, Hiroshi
    JAPANESE JOURNAL OF RADIOLOGY, 2024, : 706 - 712
  • [42] Optimizing High-Level Synthesis Designs with Retrieval-Augmented Large Language Models
    Xu, Haocheng
    Hu, Haotian
    Huang, Sitao
    2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
  • [43] Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems
    Ndimbo, Edmund V.
    Luo, Qin
    Fernando, Gimo C.
    Yang, Xu
    Wang, Bang
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [44] Multimodal retrieval-augmented generation for financial documents: image-centric analysis of charts and tables with large language models
    Jiang, Cheng
    Zhang, Pengle
    Ni, Ying
    Wang, Xiaoli
    Peng, Hanghang
    Liu, Sen
    Fei, Mengdi
    He, Yuxin
    Xiao, Yaxuan
    Huang, Jin
    Ma, Xingyu
    Yang, Tian
    VISUAL COMPUTER, 2025,
  • [45] Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models
    Jeong, Minbyul
    Sohn, Jiwoong
    Sung, Mujeen
    Kang, Jaewoo
    BIOINFORMATICS, 2024, 40 : i119 - i129
  • [46] Enhanced Recommendation Systems with Retrieval-Augmented Large Language Model
    Wei, Chuyuan
    Duan, Ke
    Zhuo, Shengda
    Wang, Hongchun
    Huang, Shuqiang
    Liu, Jie
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2025, 82 : 1147 - 1173
  • [47] A Retrieval-Augmented Framework for Tabular Interpretation with Large Language Model
    Yan, Mengyi
    Rene, Weilong
    Wang, Yaoshu
    Li, Jianxin
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2024, PT 2, 2025, 14851 : 341 - 356
  • [48] Application of retrieval-augmented generation for interactive industrial knowledge management via a large language model
    Chen, Lun-Chi
    Pardeshi, Mayuresh Sunil
    Liao, Yi-Xiang
    Pai, Kai-Chih
    COMPUTER STANDARDS & INTERFACES, 2025, 94
  • [49] Enhancing Environmental Control in Broiler Production: Retrieval-Augmented Generation for Improved Decision-Making with Large Language Models
    Leite, Marcus Vinicius
    Abe, Jair Minoro
    Souza, Marcos Leandro Hoffmann
    Naas, Irenilza de Alencar
    AGRIENGINEERING, 2025, 7 (01):
  • [50] An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models
    Wang, Mengzhao
    Wu, Haotian
    Ke, Xiangyu
    Gao, Yunjun
    Xu, Xiaoliang
    Chen, Lu
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4333 - 4336