The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)

被引:0
|
作者
Zeng, Shenglai [1 ]
Zhang, Jiankun [3 ,4 ,5 ]
He, Pengfei [1 ]
Xing, Yue [1 ]
Liu, Yiding [2 ]
Xu, Han [1 ]
Ren, Jie [1 ]
Wang, Shuaiqiang [2 ]
Yin, Dawei [2 ]
Chang, Yi [3 ,4 ,5 ]
Tang, Jiliang [1 ]
机构
[1] Michigan State Univ, E Lansing, MI 48824 USA
[2] Baidu Inc, Beijing, Peoples R China
[3] Jilin Univ, Sch Artificial Intelligence, Jilin, Jilin, Peoples R China
[4] Jilin Univ, Int Ctr Future Sci, Jilin, Jilin, Peoples R China
[5] MOE, Engn Res Ctr Knowledge Driven Human Machine Intel, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data, where data privacy is a pivotal concern. Whereas extensive research has demonstrated the privacy risks of large language models (LLMs), the RAG technique could potentially reshape the inherent behaviors of LLM generation, posing new privacy issues that are currently under-explored. In this work, we conduct extensive empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database. Despite the new risk brought by RAG on the retrieval data, we further reveal that RAG can mitigate the leakage of the LLMs' training data. Overall, we provide new insights in this paper for privacy protection of retrieval-augmented LLMs, which benefit both LLMs and RAG systems builders. Our code is available at https://github.com/phycholosogy/RAG-privacy.
引用
收藏
页码:4505 / 4524
页数:20
相关论文
共 50 条
  • [1] The Journey to A Knowledgeable Assistant with Retrieval-Augmented Generation (RAG)
    Dong, Xin Luna
    COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 3 - 3
  • [2] The Journey to A Knowledgeable Assistant with Retrieval-Augmented Generation (RAG)
    Dong, Xin Luna
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 4 - 4
  • [3] Enhancing the Precision and Interpretability of Retrieval-Augmented Generation (RAG) in Legal Technology: A Survey
    Hindi, Mahd
    Mohammed, Linda
    Maaz, Ommama
    Alwarafy, Abdulmalik
    IEEE ACCESS, 2025, 13 : 46171 - 46189
  • [4] Evaluating Retrieval Quality in Retrieval-Augmented Generation
    Salemi, Alireza
    Zamani, Hamed
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2395 - 2400
  • [5] Benchmarking Retrieval-Augmented Generation for Medicine
    Xiong, Guangzhi
    Jin, Qiao
    Lu, Zhiyong
    Zhang, Aidong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 6233 - 6251
  • [6] CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
    Lyu, Yuanjie
    Li, Zhiyu
    Niu, Simin
    Xiong, Feiyu
    Tang, Bo
    Wang, Wenjin
    Wu, Hao
    Liu, Huanyong
    Xu, Tong
    Chen, Enhong
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
  • [7] OG-RAG: ONTOLOGY-GROUNDED RETRIEVAL-AUGMENTED GENERATION FOR LARGE LANGUAGE MODELS
    Sharma, Kartik
    Kumar, Peeyush
    Li, Yunqing
    arXiv,
  • [8] Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization
    Zamani, Hamed
    Bendersky, Michael
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2641 - 2646
  • [9] AI-Enhanced Social Work: Developing and Evaluating Retrieval-Augmented Generation (RAG) Support Systems
    Perron, Brian E.
    Hiltz, Barbara S.
    Khang, Erin M.
    Savas, Sue Ann
    JOURNAL OF SOCIAL WORK EDUCATION, 2025, 61 (01) : 3 - 13
  • [10] QuIM-RAG: Advancing Retrieval-Augmented Generation With Inverted Question Matching for Enhanced QA Performance
    Saha, Binita
    Saha, Utsha
    Malik, Muhammad Zubair
    IEEE ACCESS, 2024, 12 : 185401 - 185410