Large language models: a new approach for privacy policy analysis at scale

被引:0
|
作者
Rodriguez, David [1 ]
Yang, Ian [2 ]
Del Alamo, Jose M. [1 ]
Sadeh, Norman [2 ]
机构
[1] Univ Politecn Madrid, ETSI Telecomunicac, Madrid, Spain
[2] Carnegie Mellon Univ, Sch Comp Sci, Forbes Ave, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Large language models; Natural language processing; Privacy policies; Data protection; Privacy; Feature extraction;
D O I
10.1007/s00607-024-01331-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The number and dynamic nature of web sites and mobile applications present regulators and app store operators with significant challenges when it comes to enforcing compliance with applicable privacy and data protection laws. Over the past several years, people have turned to Natural Language Processing (NLP) techniques to automate privacy compliance analysis (e.g., comparing statements in privacy policies with analysis of the code and behavior of mobile apps) and to answer people's privacy questions. Traditionally, these NLP techniques have relied on labor-intensive and potentially error-prone manual annotation processes to build the corpora necessary to train them. This article explores and evaluates the use of Large Language Models (LLMs) as an alternative for effectively and efficiently identifying and categorizing a variety of data practice disclosures found in the text of privacy policies. Specifically, we report on the performance of ChatGPT and Llama 2, two particularly popular LLM-based tools. This includes engineering prompts and evaluating different configurations of these LLM techniques. Evaluation of the resulting techniques on well-known corpora of privacy policy annotations yields an F1 score exceeding 93%. This score is higher than scores reported earlier in the literature on these benchmarks. This performance is obtained at minimal marginal cost (excluding the cost required to train the foundational models themselves). These results, which are consistent with those reported in other domains, suggest that LLMs offer a particularly promising approach to automated privacy policy analysis at scale.
引用
收藏
页码:3879 / 3903
页数:25
相关论文
共 50 条
  • [21] Large-Scale Readability Analysis of Privacy Policies
    Fabian, Benjamin
    Ermakova, Tatiana
    Lentz, Tino
    2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), 2017, : 18 - 25
  • [22] Evaluating Quantized Llama 2 Models for IoT Privacy Policy Language Generation
    Malisetty, Bhavani
    Perez, Alfredo J.
    FUTURE INTERNET, 2024, 16 (07)
  • [23] EANA: Reducing Privacy Risk on Large-scale Recommendation Models
    Berlowitz, Devora
    Chen, Mei
    Chien, Steve
    Ning, Lin
    Song, Shuang
    Xue, Yunqi
    PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 399 - 407
  • [24] Approaches for performing uncertainty analysis in large-scale energy/economic policy models
    Kann, A
    Weyant, JP
    ENVIRONMENTAL MODELING & ASSESSMENT, 2000, 5 (01) : 29 - 46
  • [25] Distributed Learning for Large-Scale Models at Edge With Privacy Protection
    Yuan, Yuan
    Chen, Shuzhen
    Yu, Dongxiao
    Zhao, Zengrui
    Zou, Yifei
    Cui, Lizhen
    Cheng, Xiuzhen
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (04) : 1060 - 1070
  • [26] Approaches for performing uncertainty analysis in large-scale energy/economic policy models
    Antje Kann
    John P. Weyant
    Environmental Modeling & Assessment, 2000, 5 : 29 - 46
  • [27] Mitigating Privacy Seesaw in Large Language Models: Augmented Privacy Neuron Editing via Activation Patching
    Wu, Xinwei
    Dong, Weilong
    Xu, Shaoyang
    Xiong, Deyi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5319 - 5332
  • [28] CPPL: Compact Privacy Policy Language
    Henze, Martin
    Hiller, Jens
    Schmerling, Sascha
    Ziegeldorf, Jan Henrik
    Wehrle, Klaus
    PROCEEDINGS OF THE 2016 ACM WORKSHOP ON PRIVACY IN THE ELECTRONIC SOCIETY (WPES'16), 2016, : 99 - 110
  • [29] Is Privacy Policy Language Irrelevant to Consumers?
    Strahilevitz, Lior Jacob
    Kugler, Matthew B.
    JOURNAL OF LEGAL STUDIES, 2016, 45 : S69 - S95
  • [30] Privacy-preserving large language models for structured medical information retrieval
    Wiest, Isabella Catharina
    Ferber, Dyke
    Zhu, Jiefu
    van Treeck, Marko
    Meyer, Sonja K.
    Juglan, Radhika
    Carrero, Zunamys I.
    Paech, Daniel
    Kleesiek, Jens
    Ebert, Matthias P.
    Truhn, Daniel
    Kather, Jakob Nikolas
    NPJ DIGITAL MEDICINE, 2024, 7 (01):