The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)

被引:0
|
作者
Zeng, Shenglai [1 ]
Zhang, Jiankun [3 ,4 ,5 ]
He, Pengfei [1 ]
Xing, Yue [1 ]
Liu, Yiding [2 ]
Xu, Han [1 ]
Ren, Jie [1 ]
Wang, Shuaiqiang [2 ]
Yin, Dawei [2 ]
Chang, Yi [3 ,4 ,5 ]
Tang, Jiliang [1 ]
机构
[1] Michigan State Univ, E Lansing, MI 48824 USA
[2] Baidu Inc, Beijing, Peoples R China
[3] Jilin Univ, Sch Artificial Intelligence, Jilin, Jilin, Peoples R China
[4] Jilin Univ, Int Ctr Future Sci, Jilin, Jilin, Peoples R China
[5] MOE, Engn Res Ctr Knowledge Driven Human Machine Intel, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data, where data privacy is a pivotal concern. Whereas extensive research has demonstrated the privacy risks of large language models (LLMs), the RAG technique could potentially reshape the inherent behaviors of LLM generation, posing new privacy issues that are currently under-explored. In this work, we conduct extensive empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database. Despite the new risk brought by RAG on the retrieval data, we further reveal that RAG can mitigate the leakage of the LLMs' training data. Overall, we provide new insights in this paper for privacy protection of retrieval-augmented LLMs, which benefit both LLMs and RAG systems builders. Our code is available at https://github.com/phycholosogy/RAG-privacy.
引用
收藏
页码:4505 / 4524
页数:20
相关论文
共 50 条
  • [41] Self-explanatory Retrieval-Augmented Generation for SDG Evidence Identification
    Garigliotti, Dario
    ADVANCES IN CONCEPTUAL MODELING, ER 2024 WORKSHOPS, 2025, 14932 : 124 - 132
  • [42] GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence
    Wen, Zhihua
    Tian, Zhiliang
    Wu, Wei
    Yang, Yuxin
    Shi, Yanqi
    Huang, Zhen
    Li, Dongsheng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3980 - 3998
  • [43] Automating Systematic Literature Reviews with Retrieval-Augmented Generation: A Comprehensive Overview
    Han, Binglan
    Susnjak, Teo
    Mathrani, Anuradha
    APPLIED SCIENCES-BASEL, 2024, 14 (19):
  • [44] FABULA: Intelligence Report Generation Using Retrieval-Augmented Narrative Construction
    Ranade, Priyanka
    Joshi, Anupam
    PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, 2023, : 604 - 611
  • [45] Retrieval-Augmented Response Generation for Knowledge-Grounded Conversation in the Wild
    Ahn, Yeonchan
    Lee, Sang-Goo
    Shim, Junho
    Park, Jaehui
    IEEE ACCESS, 2022, 10 : 131374 - 131385
  • [46] A Chatbot for the Legal Sector of Mauritius Using the Retrieval-Augmented Generation AI Framework
    Mohamed, Taariq Noor
    Pudaruth, Sameerchand
    Coste-Maniere, Ivan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (02) : 120 - 134
  • [47] A Dynamic Retrieval-Augmented Generation Framework for Border Inspection Legal Question Answering
    Zhang, Yanjun
    Li, Dapeng
    Peng, Gaojun
    Guo, Shuang
    Dou, Yu
    Yi, Ruheng
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 372 - 376
  • [48] RAG-Ex: A Generic Framework for Explaining Retrieval Augmented Generation
    Sudhi, Viju
    Bhat, Sinchana Ramakanth
    Rudat, Max
    Teucher, Roman
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2776 - 2780
  • [49] Integrating Graph Retrieval-Augmented Generation With Large Language Models for Supplier Discovery
    Li, Yunqing
    Ko, Hyunwoong
    Ameri, Farhad
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2025, 25 (02)
  • [50] VistaRAG: Toward Safe and Trustworthy Autonomous Driving Through Retrieval-Augmented Generation
    Dai, Xingyuan
    Guo, Chao
    Tang, Yun
    Li, Haichuan
    Wang, Yutong
    Huang, Jun
    Tian, Yonglin
    Xia, Xin
    Lv, Yisheng
    Wang, Fei-Yue
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (04): : 4579 - 4582