The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)

被引:0
|
作者
Zeng, Shenglai [1 ]
Zhang, Jiankun [3 ,4 ,5 ]
He, Pengfei [1 ]
Xing, Yue [1 ]
Liu, Yiding [2 ]
Xu, Han [1 ]
Ren, Jie [1 ]
Wang, Shuaiqiang [2 ]
Yin, Dawei [2 ]
Chang, Yi [3 ,4 ,5 ]
Tang, Jiliang [1 ]
机构
[1] Michigan State Univ, E Lansing, MI 48824 USA
[2] Baidu Inc, Beijing, Peoples R China
[3] Jilin Univ, Sch Artificial Intelligence, Jilin, Jilin, Peoples R China
[4] Jilin Univ, Int Ctr Future Sci, Jilin, Jilin, Peoples R China
[5] MOE, Engn Res Ctr Knowledge Driven Human Machine Intel, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data, where data privacy is a pivotal concern. Whereas extensive research has demonstrated the privacy risks of large language models (LLMs), the RAG technique could potentially reshape the inherent behaviors of LLM generation, posing new privacy issues that are currently under-explored. In this work, we conduct extensive empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database. Despite the new risk brought by RAG on the retrieval data, we further reveal that RAG can mitigate the leakage of the LLMs' training data. Overall, we provide new insights in this paper for privacy protection of retrieval-augmented LLMs, which benefit both LLMs and RAG systems builders. Our code is available at https://github.com/phycholosogy/RAG-privacy.
引用
收藏
页码:4505 / 4524
页数:20
相关论文
共 50 条
  • [31] Retrieval-Augmented Generation: Advancing personalized care and research in oncology
    Zarfati, Mor
    Soffer, Shelly
    Nadkarni, Girish N.
    Klang, Eyal
    EUROPEAN JOURNAL OF CANCER, 2025, 220
  • [32] An advanced retrieval-augmented generation system for manufacturing quality control
    Alvaro, Jose Antonio Heredia
    Barreda, Javier Gonzalez
    ADVANCED ENGINEERING INFORMATICS, 2025, 64
  • [33] Improving Assessment of Tutoring Practices using Retrieval-Augmented Generation
    Han, Zifei FeiFei
    Lin, Jionghao
    Gurung, Ashish
    Thomas, Danielle R.
    Chen, Eason
    Borchers, Conrad
    Gupta, Shivang
    Koedinger, Kenneth R.
    AI FOR EDUCATION WORKSHOP, 2024, 257 : 66 - 76
  • [34] Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems
    Ndimbo, Edmund V.
    Luo, Qin
    Fernando, Gimo C.
    Yang, Xu
    Wang, Bang
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [35] LLM-based and Retrieval-Augmented Control Code Generation
    Koziolek, Heiko
    Gruener, Sten
    Hark, Rhaban
    Ashiwal, Virendra
    Linsbauer, Sofia
    Eskandani, Nafise
    2024 INTERNATIONAL WORKSHOP ON LARGE LANGUAGE MODELS FOR CODE, LLM4CODE 2024, 2024, : 22 - 29
  • [36] Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
    Shaol, Zhihong
    Gong, Yeyun
    Shen, Yelong
    Huang, Minlie
    Duane, Nan
    Chen, Weizhu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9248 - 9274
  • [37] GastroBot: a Chinese gastrointestinal disease chatbot based on the retrieval-augmented generation
    Zhou, Qingqing
    Liu, Can
    Duan, Yuchen
    Sun, Kaijie
    Li, Yu
    Kan, Hongxing
    Gu, Zongyun
    Shu, Jianhua
    Hu, Jili
    FRONTIERS IN MEDICINE, 2024, 11
  • [38] Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering
    Xu, Zhentao
    Cruz, Mark Jerome
    Guevara, Matthew
    Wang, Tie
    Deshpande, Manasi
    Wang, Xiaofeng
    Li, Zheng
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2905 - 2909
  • [39] FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation
    Hofstatter, Sebastian
    Chen, Jiecao
    Raman, Karthik
    Zamani, Hamed
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1437 - 1447
  • [40] Evaluating Retrieval-Augmented Generation Models for Financial Report Question and Answering
    Iaroshev, Ivan
    Pillai, Ramalingam
    Vaglietti, Leandro
    Hanne, Thomas
    APPLIED SCIENCES-BASEL, 2024, 14 (20):