CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

被引:1
|
作者
Lyu, Yuanjie [1 ]
Li, Zhiyu [2 ]
Niu, Simin [3 ]
Xiong, Feiyu [2 ]
Tang, Bo [2 ]
Wang, Wenjin [2 ]
Wu, Hao [2 ]
Liu, Huanyong [4 ]
Xu, Tong [1 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Inst Adv Algorithms Res Shanghai, Shanghai, Peoples R China
[3] Renmin Univ China, Beijing, Peoples R China
[4] 360 AI Res Inst, Beijing, Peoples R China
关键词
D O I
10.1145/3701228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retrieval-augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, evaluating RAG systems is a challenge. Most benchmarks focus primarily on question-answering applications, neglecting other potential scenarios where RAG could be beneficial. Accordingly, in the experiments, these benchmarks often assess only the LLM components of the RAG pipeline or the retriever in knowledge-intensive scenarios, overlooking the impact of external knowledge base construction and the retrieval component on the entire RAG pipeline in non-knowledge-intensive scenarios. To address these issues, this article constructs a large-scale and more comprehensive benchmark and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we refer to the CRUD actions that describe interactions between users and knowledge bases and also categorize the range of RAG applications into four distinct types-create, read, update, and delete (CRUD). "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed different datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, context length, knowledge base construction, and LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. The source code is available at GitHub: https://github.com/IAAR-Shanghai/CRUD_RAG.
引用
收藏
页数:32
相关论文
共 50 条
  • [21] Retrieval-augmented large language models for clinical trial screening.
    He, Jianqiao
    Gai, Shanglei
    Ho, Si Xian
    Chua, Shi Ling
    Oo, Viviana
    Zaw, Ma Wai Wai
    Tan, Daniel Shao-Weng
    Tan, Ryan
    JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (23_SUPPL) : 157 - 157
  • [22] Advancing Cyber Incident Timeline Analysis Through Retrieval-Augmented Generation and Large Language Models
    Loumachi, Fatma Yasmine
    Ghanem, Mohamed Chahine
    Ferrag, Mohamed Amine
    COMPUTERS, 2025, 14 (02)
  • [23] Retrieval-augmented large language models for clinical trial screening.
    Tan, Ryan
    Ho, Si Xian
    Oo, Shiyun Vivianna Fequira
    Chua, Shi Ling
    Zaw, Ma Wai Wai
    Tan, Daniel Shao-Weng
    JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (16)
  • [24] Utilizing Retrieval-Augmented Large Language Models for Pregnancy Nutrition Advice
    Bano, Taranum
    Vadapalli, Jagadeesh
    Karki, Bishwa
    Thoene, Melissa K.
    VanOrmer, Matt
    Berry, Ann L. Anderson
    Tsai, Chun-Hua
    NEW TRENDS IN DISRUPTIVE TECHNOLOGIES, TECH ETHICS, AND ARTIFICIAL INTELLIGENCE, DITTET 2024, 2024, 1459 : 85 - 96
  • [25] Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models
    Li, Mingda
    Li, Xinyu
    Chen, Yifan
    Xuan, Wenfeng
    Zhang, Weinan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4833 - 4850
  • [26] Can Small Language Models With Retrieval-Augmented Generation Replace Large Language Models When Learning Computer Science?
    Liu, Suqing
    Yu, Zezhu
    Huang, Feiran
    Bulbulia, Yousef
    Bergen, Andreas
    Liut, Michael
    PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024, 2024, : 388 - 393
  • [27] Leveraging Retrieval-Augmented Generation for Reliable Medical Question Answering Using Large Language Models
    Kharitonova, Ksenia
    Perez-Fernandez, David
    Gutierrez-Hernando, Javier
    Gutierrez-Fandino, Asier
    Callejas, Zoraida
    Griol, David
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, HAIS 2024, 2025, 14858 : 141 - 153
  • [28] Layered Query Retrieval: An Adaptive Framework for Retrieval-Augmented Generation in Complex Question Answering for Large Language Models
    Huang, Jie
    Wang, Mo
    Cui, Yunpeng
    Liu, Juan
    Chen, Li
    Wang, Ting
    Li, Huan
    Wu, Jinming
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [29] In-Context Retrieval-Augmented Language Models
    Ram, Ori
    Levine, Yoav
    Dalmedigos, Itay
    Muhlgay, Dor
    Shashua, Amnon
    Leyton-Brown, Kevin
    Shoham, Yoav
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1316 - 1331
  • [30] The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)
    Zeng, Shenglai
    Zhang, Jiankun
    He, Pengfei
    Xing, Yue
    Liu, Yiding
    Xu, Han
    Ren, Jie
    Wang, Shuaiqiang
    Yin, Dawei
    Chang, Yi
    Tang, Jiliang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4505 - 4524