Large language models: a new approach for privacy policy analysis at scale

被引:0
|
作者
Rodriguez, David [1 ]
Yang, Ian [2 ]
Del Alamo, Jose M. [1 ]
Sadeh, Norman [2 ]
机构
[1] Univ Politecn Madrid, ETSI Telecomunicac, Madrid, Spain
[2] Carnegie Mellon Univ, Sch Comp Sci, Forbes Ave, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Large language models; Natural language processing; Privacy policies; Data protection; Privacy; Feature extraction;
D O I
10.1007/s00607-024-01331-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The number and dynamic nature of web sites and mobile applications present regulators and app store operators with significant challenges when it comes to enforcing compliance with applicable privacy and data protection laws. Over the past several years, people have turned to Natural Language Processing (NLP) techniques to automate privacy compliance analysis (e.g., comparing statements in privacy policies with analysis of the code and behavior of mobile apps) and to answer people's privacy questions. Traditionally, these NLP techniques have relied on labor-intensive and potentially error-prone manual annotation processes to build the corpora necessary to train them. This article explores and evaluates the use of Large Language Models (LLMs) as an alternative for effectively and efficiently identifying and categorizing a variety of data practice disclosures found in the text of privacy policies. Specifically, we report on the performance of ChatGPT and Llama 2, two particularly popular LLM-based tools. This includes engineering prompts and evaluating different configurations of these LLM techniques. Evaluation of the resulting techniques on well-known corpora of privacy policy annotations yields an F1 score exceeding 93%. This score is higher than scores reported earlier in the literature on these benchmarks. This performance is obtained at minimal marginal cost (excluding the cost required to train the foundational models themselves). These results, which are consistent with those reported in other domains, suggest that LLMs offer a particularly promising approach to automated privacy policy analysis at scale.
引用
收藏
页码:3879 / 3903
页数:25
相关论文
共 50 条
  • [1] Analysis of Privacy Leakage in Federated Large Language Models
    Vu, Minh N.
    Nguyen, Truc
    Jeter, Tre' R.
    Thai, My T.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [2] A Privacy Policy Text Compliance Reasoning Framework with Large Language Models for Healthcare Services
    Chen, Jintao
    Wang, Fan
    Pang, Shengye
    Chen, Mingshuai
    Xi, Meng
    Zhao, Tiancheng
    Yin, Jianwei
    TSINGHUA SCIENCE AND TECHNOLOGY, 2025, 30 (04): : 1831 - 1845
  • [3] Privacy Now or Never: Large-Scale Extraction and Analysis of Dates in Privacy Policy Text
    Srinath, Mukund
    Matheson, Lee
    Venkit, Pranav Narayanan
    Zanfir-Fortuna, Gabriela
    Schaub, Florian
    Giles, C. Lee
    Wilson, Shomir
    PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, DOCENG 2023, 2023,
  • [4] Privacy issues in Large Language Models: A survey
    Kibriya, Hareem
    Khan, Wazir Zada
    Siddiqa, Ayesha
    Khan, Muhammad Khurrum
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 120
  • [5] Cybercrime and Privacy Threats of Large Language Models
    Kshetri, Nir
    IT PROFESSIONAL, 2023, 25 (03) : 9 - 13
  • [6] ProPILE: Probing Privacy Leakage in Large Language Models
    Kim, Siwon
    Yun, Sangdoo
    Lee, Hwaran
    Gubri, Martin
    Yoon, Sungroh
    Oh, Seong Joon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Security and Privacy Challenges of Large Language Models: A Survey
    Das, Badhan chandra
    Amini, M. hadi
    Wu, Yanzhao
    ACM COMPUTING SURVEYS, 2025, 57 (06)
  • [8] On large language models safety, security, and privacy: A survey
    Ran Zhang
    Hong-Wei Li
    Xin-Yuan Qian
    Wen-Bo Jiang
    Han-Xiao Chen
    Journal of Electronic Science and Technology, 2025, 23 (01) : 3 - 23
  • [9] Journal policy on large language generative models
    Sessler, Daniel I.
    Turan, Alparslan
    JOURNAL OF CLINICAL ANESTHESIA, 2024, 96
  • [10] Balancing Privacy and Robustness in Prompt Learning for Large Language Models
    Shi, Chiyu
    Su, Junyu
    Chu, Chiawei
    Wang, Baoping
    Feng, Duanyang
    MATHEMATICS, 2024, 12 (21)