CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

被引:1
|
作者
Lyu, Yuanjie [1 ]
Li, Zhiyu [2 ]
Niu, Simin [3 ]
Xiong, Feiyu [2 ]
Tang, Bo [2 ]
Wang, Wenjin [2 ]
Wu, Hao [2 ]
Liu, Huanyong [4 ]
Xu, Tong [1 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Inst Adv Algorithms Res Shanghai, Shanghai, Peoples R China
[3] Renmin Univ China, Beijing, Peoples R China
[4] 360 AI Res Inst, Beijing, Peoples R China
关键词
D O I
10.1145/3701228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retrieval-augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, evaluating RAG systems is a challenge. Most benchmarks focus primarily on question-answering applications, neglecting other potential scenarios where RAG could be beneficial. Accordingly, in the experiments, these benchmarks often assess only the LLM components of the RAG pipeline or the retriever in knowledge-intensive scenarios, overlooking the impact of external knowledge base construction and the retrieval component on the entire RAG pipeline in non-knowledge-intensive scenarios. To address these issues, this article constructs a large-scale and more comprehensive benchmark and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we refer to the CRUD actions that describe interactions between users and knowledge bases and also categorize the range of RAG applications into four distinct types-create, read, update, and delete (CRUD). "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed different datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, context length, knowledge base construction, and LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. The source code is available at GitHub: https://github.com/IAAR-Shanghai/CRUD_RAG.
引用
收藏
页数:32
相关论文
共 50 条
  • [31] Enhancement of the Performance of Large Language Models inDiabetes Education through Retrieval-Augmented Generation:Comparative Study
    Wang, Dingqiao
    Liang, Jiangbo
    Ye, Jinguo
    Li, Jingni
    Li, Jingpeng
    Zhang, Qikai
    Hu, Qiuling
    Pan, Caineng
    Wang, Dongliang
    Liu, Zhong
    Shi, Wen
    Shi, Danli
    Li, Fei
    Qu, Bo
    Zheng, Yingfeng
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [32] Retrieval-Augmented Generation for Large Language Models in Radiology: Another Leap Forward in Board Examination Performance
    Bhayana, Rajesh
    Fawzy, Aly
    Deng, Yangqing
    Bleakney, Robert R.
    Krishna, Satheesh
    RADIOLOGY, 2024, 313 (01)
  • [33] Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models
    Kim, Gangwoo
    Kim, Sungdong
    Jeon, Byeongguk
    Park, Joonsuk
    Kang, Jaewoo
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 996 - 1009
  • [34] Retrieval-augmented Recommender System: Enhancing Recommender Systems with Large Language Models
    Di Palma, Dario
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1369 - 1373
  • [35] A Comprehensive Survey of Retrieval-Augmented Large Language Models for Decision Making in Agriculture: Unsolved Problems and Research Opportunities
    Vizniuk, Artem
    Diachenko, Grygorii
    Laktionov, Ivan
    Siwocha, Agnieszka
    Xiao, Min
    Smolag, Jacek
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2025, 15 (02) : 115 - 146
  • [36] Performance comparison of retrieval-augmented generation and fine-tuned large language models for construction safety management knowledge retrieval
    Lee, Jungwon
    Ahn, Seungjun
    Kim, Daeho
    Kim, Dongkyun
    AUTOMATION IN CONSTRUCTION, 2024, 168
  • [37] RA-CFGPT: Chinese financial assistant with retrieval-augmented large language model
    Li, Jiangtong
    Lei, Yang
    Bian, Yuxuan
    Cheng, Dawei
    Ding, Zhijun
    Jiang, Changjun
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (05)
  • [38] Enhancing the Precision and Interpretability of Retrieval-Augmented Generation (RAG) in Legal Technology: A Survey
    Hindi, Mahd
    Mohammed, Linda
    Maaz, Ommama
    Alwarafy, Abdulmalik
    IEEE ACCESS, 2025, 13 : 46171 - 46189
  • [39] LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
    Yang, Kaiyu
    Swope, Aidan M.
    Gu, Alex
    Chalamala, Rahul
    Song, Peiyang
    Yu, Shixing
    Godil, Saad
    Prenger, Ryan
    Anandkumar, Anima
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [40] Retrieval-Augmented Generation Approach: Document Question Answering using Large Language Model
    Muludi, Kurnia
    Fitria, Kaira Milani
    Triloka, Joko
    Sutedi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (03) : 776 - 785