Large language models: a new approach for privacy policy analysis at scale

被引:0
|
作者
Rodriguez, David [1 ]
Yang, Ian [2 ]
Del Alamo, Jose M. [1 ]
Sadeh, Norman [2 ]
机构
[1] Univ Politecn Madrid, ETSI Telecomunicac, Madrid, Spain
[2] Carnegie Mellon Univ, Sch Comp Sci, Forbes Ave, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Large language models; Natural language processing; Privacy policies; Data protection; Privacy; Feature extraction;
D O I
10.1007/s00607-024-01331-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The number and dynamic nature of web sites and mobile applications present regulators and app store operators with significant challenges when it comes to enforcing compliance with applicable privacy and data protection laws. Over the past several years, people have turned to Natural Language Processing (NLP) techniques to automate privacy compliance analysis (e.g., comparing statements in privacy policies with analysis of the code and behavior of mobile apps) and to answer people's privacy questions. Traditionally, these NLP techniques have relied on labor-intensive and potentially error-prone manual annotation processes to build the corpora necessary to train them. This article explores and evaluates the use of Large Language Models (LLMs) as an alternative for effectively and efficiently identifying and categorizing a variety of data practice disclosures found in the text of privacy policies. Specifically, we report on the performance of ChatGPT and Llama 2, two particularly popular LLM-based tools. This includes engineering prompts and evaluating different configurations of these LLM techniques. Evaluation of the resulting techniques on well-known corpora of privacy policy annotations yields an F1 score exceeding 93%. This score is higher than scores reported earlier in the literature on these benchmarks. This performance is obtained at minimal marginal cost (excluding the cost required to train the foundational models themselves). These results, which are consistent with those reported in other domains, suggest that LLMs offer a particularly promising approach to automated privacy policy analysis at scale.
引用
收藏
页码:3879 / 3903
页数:25
相关论文
共 50 条
  • [31] Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
    Zarski, Tymon Leslaw
    Janicki, Artur
    INFORMATION, 2025, 16 (01)
  • [32] Beyond Individual Concerns: Multi-user Privacy in Large Language Models
    Zhan, Xiao
    Seymour, William
    Such, Jose
    PROCEEDINGS OF THE 6TH CONFERENCE ON ACM CONVERSATIONAL USER INTERFACES, CUI 2024, 2024,
  • [33] Privacy preserving strategies for electronic health records in the era of large language models
    Jonnagaddala, Jitendra
    Wong, Zoie Shui-Yee
    NPJ DIGITAL MEDICINE, 2025, 8 (01):
  • [34] Neurosymbolic AI Approach to Attribution in Large Language Models
    Tilwani, Deepa
    Venkataramanan, Revathy
    Sheth, Amit P.
    IEEE INTELLIGENT SYSTEMS, 2024, 39 (06) : 10 - 17
  • [35] Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
    Zhao, Zirui
    Lee, Wee Sun
    Hsu, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
    Li, Junyi
    Cheng, Xiaoxue
    Zhao, Wayne Xin
    Nie, Jian-Yun
    Wen, Ji-Rong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6449 - 6464
  • [37] Improving Large-scale Language Models and Resources for Filipino
    Cruz, Jan Christian Blaise
    Cheng, Charibeth
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6548 - 6555
  • [38] A LARGE-SCALE STUDY OF LANGUAGE MODELS FOR CHORD PREDICTION
    Korzeniowski, Filip
    Sears, David R. W.
    Widmer, Gerhard
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 91 - 95
  • [39] Trend Analysis Through Large Language Models
    Alzapiedi, Lucas
    Bihl, Trevor
    IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE, NAECON 2024, 2024, : 370 - 374
  • [40] Automated Topic Analysis with Large Language Models
    Kirilenko, Andrei
    Stepchenkova, Svetlana
    INFORMATION AND COMMUNICATION TECHNOLOGIES IN TOURISM 2024, ENTER 2024, 2024, : 29 - 34