Large language models: a new approach for privacy policy analysis at scale

被引:0
|
作者
Rodriguez, David [1 ]
Yang, Ian [2 ]
Del Alamo, Jose M. [1 ]
Sadeh, Norman [2 ]
机构
[1] Univ Politecn Madrid, ETSI Telecomunicac, Madrid, Spain
[2] Carnegie Mellon Univ, Sch Comp Sci, Forbes Ave, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Large language models; Natural language processing; Privacy policies; Data protection; Privacy; Feature extraction;
D O I
10.1007/s00607-024-01331-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The number and dynamic nature of web sites and mobile applications present regulators and app store operators with significant challenges when it comes to enforcing compliance with applicable privacy and data protection laws. Over the past several years, people have turned to Natural Language Processing (NLP) techniques to automate privacy compliance analysis (e.g., comparing statements in privacy policies with analysis of the code and behavior of mobile apps) and to answer people's privacy questions. Traditionally, these NLP techniques have relied on labor-intensive and potentially error-prone manual annotation processes to build the corpora necessary to train them. This article explores and evaluates the use of Large Language Models (LLMs) as an alternative for effectively and efficiently identifying and categorizing a variety of data practice disclosures found in the text of privacy policies. Specifically, we report on the performance of ChatGPT and Llama 2, two particularly popular LLM-based tools. This includes engineering prompts and evaluating different configurations of these LLM techniques. Evaluation of the resulting techniques on well-known corpora of privacy policy annotations yields an F1 score exceeding 93%. This score is higher than scores reported earlier in the literature on these benchmarks. This performance is obtained at minimal marginal cost (excluding the cost required to train the foundational models themselves). These results, which are consistent with those reported in other domains, suggest that LLMs offer a particularly promising approach to automated privacy policy analysis at scale.
引用
收藏
页码:3879 / 3903
页数:25
相关论文
共 50 条
  • [41] Multimodal large language models for bioimage analysis
    Zhang, Shanghang
    Dai, Gaole
    Huang, Tiejun
    Chen, Jianxu
    NATURE METHODS, 2024, 21 (08) : 1390 - 1393
  • [42] KCL: A Declarative Language for Large-Scale Configuration and Policy Management
    Duo, XiaoDong
    Xu, Pengfei
    Zhang, Zheng
    Chai, Shushan
    Xia, Rui
    Zong, Zhe
    DEPENDABLE SOFTWARE ENGINEERING. THEORIES, TOOLS, AND APPLICATIONS, SETTA, 2022, 13649 : 88 - 105
  • [43] A Policy on the Use of Artificial Intelligence and Large Language Models in Peer Review
    Munafo, Marcus
    NICOTINE & TOBACCO RESEARCH, 2023, 26 (05) : 519 - 519
  • [44] Aligning Large Language Models by On-Policy Self-Judgment
    Lee, Sangkyu
    Kim, Sungdong
    Yousefpour, Ashkan
    Seo, Minjoon
    Yoo, Kang Min
    Yu, Youngjae
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11442 - 11459
  • [45] Security Policy Generation and Verification through Large Language Models: A proposal
    Martinelli, Fabio
    Mercaldo, Francesco
    Petrillo, Luca
    Santone, Antonella
    PROCEEDINGS OF THE FOURTEENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, CODASPY 2024, 2024, : 143 - 145
  • [46] Large language models: a new chapter in digital health
    不详
    LANCET DIGITAL HEALTH, 2024, 6 (01): : e1 - e1
  • [47] ALCUNA: Large Language Models Meet New Knowledge
    Yin, Xunjian
    Huang, Baizhou
    Wan, Xiaojun
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1397 - 1414
  • [48] Large language models: a new chapter in digital health
    The Lancet Digital Health
    The Lancet Digital Health, 2024, 6 (01):
  • [49] Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy
    Bozkir, Efe
    Ozdel, Suleyman
    Lau, Ka Hei Carrie
    Wang, Mengdi
    Gao, Hong
    Kasneci, Enkelejda
    PROCEEDINGS OF THE 6TH CONFERENCE ON ACM CONVERSATIONAL USER INTERFACES, CUI 2024, 2024,
  • [50] Selective privacy-preserving framework for large language models fine-tuning
    Wang, Teng
    Zhai, Lindong
    Yang, Tengfei
    Luo, Zhucheng
    Liu, Shuanggen
    INFORMATION SCIENCES, 2024, 678