Privacy-preserving dataset combination and Lasso regression for healthcare predictions

被引:23
|
作者
van Egmond, Marie Beth [1 ]
Spini, Gabriele [1 ]
van der Galien, Onno [5 ]
IJpma, Arne [6 ]
Veugen, Thijs [1 ,3 ]
Kraaij, Wessel [1 ,2 ]
Sangers, Alex [1 ]
Rooijakkers, Thomas [1 ]
Langenkamp, Peter [1 ]
Kamphorst, Bart [1 ]
van de L'Isle, Natasja [4 ]
Kooij-Janic, Milena [1 ]
机构
[1] TNO Dutch Org Appl Sci Res, Unit ICT, The Hague, Netherlands
[2] Leiden Univ, Leiden Inst Adv Comp Sci, Leiden, Netherlands
[3] Ctr Wiskunde & Informat CWI, Cryptol Res Grp, Amsterdam, Netherlands
[4] TMC Data Sci, Eindhoven, Netherlands
[5] Achmea, Zeist, Netherlands
[6] Erasmus MC, Rotterdam, Netherlands
基金
欧盟地平线“2020”;
关键词
Secure multi-party computation; Privacy; Machine learning; Lasso regression; RIDGE-REGRESSION;
D O I
10.1186/s12911-021-01582-y
中图分类号
R-058 [];
学科分类号
摘要
Background Recent developments in machine learning have shown its potential impact for clinical use such as risk prediction, prognosis, and treatment selection. However, relevant data are often scattered across different stakeholders and their use is regulated, e.g. by GDPR or HIPAA. As a concrete use-case, hospital Erasmus MC and health insurance company Achmea have data on individuals in the city of Rotterdam, which would in theory enable them to train a regression model in order to identify high-impact lifestyle factors for heart failure. However, privacy and confidentiality concerns make it unfeasible to exchange these data. Methods This article describes a solution where vertically-partitioned synthetic data of Achmea and of Erasmus MC are combined using Secure Multi-Party Computation. First, a secure inner join protocol takes place to securely determine the identifiers of the patients that are represented in both datasets. Then, a secure Lasso Regression model is trained on the securely combined data. The involved parties thus obtain the prediction model but no further information on the input data of the other parties. Results We implement our secure solution and describe its performance and scalability: we can train a prediction model on two datasets with 5000 records each and a total of 30 features in less than one hour, with a minimal difference from the results of standard (non-secure) methods. Conclusions This article shows that it is possible to combine datasets and train a Lasso regression model on this combination in a secure way. Such a solution thus further expands the potential of privacy-preserving data analysis in the medical domain.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Privacy-preserving data collection for 1: M dataset
    Abrar, M.
    Zuhaira, B.
    Anjum, A.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (20) : 31335 - 31356
  • [22] Privacy-preserving data collection for 1: M dataset
    M. Abrar
    B. Zuhaira
    A. Anjum
    Multimedia Tools and Applications, 2021, 80 : 31335 - 31356
  • [23] DIGRESSION AND VALUE CONCATENATION TO ENABLE PRIVACY-PRESERVING REGRESSION
    Li, Xiao-Bai
    Sarkar, Sumit
    MIS QUARTERLY, 2014, 38 (03) : 679 - 698
  • [24] Efficient homomorphic encryption framework for privacy-preserving regression
    Byun, Junyoung
    Park, Saerom
    Choi, Yujin
    Lee, Jaewook
    APPLIED INTELLIGENCE, 2023, 53 (09) : 10114 - 10129
  • [25] Privacy-Preserving Statistical Analysis by Exact Logistic Regression
    duVerle, David A.
    Kawasaki, Shohei
    Yamada, Yoshiji
    Sakuma, Jun
    Tsuda, Koji
    2015 IEEE SECURITY AND PRIVACY WORKSHOPS (SPW), 2015, : 7 - 16
  • [26] Efficient homomorphic encryption framework for privacy-preserving regression
    Junyoung Byun
    Saerom Park
    Yujin Choi
    Jaewook Lee
    Applied Intelligence, 2023, 53 : 10114 - 10129
  • [27] Privacy-Preserving Logistic Regression on Vertically Partitioned Data
    Song L.
    Ma C.
    Duan G.
    Yuan Q.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (10): : 2243 - 2249
  • [28] Empowering Healthcare through Privacy-Preserving MRI Analysis
    Amin, Al
    Hasan, Kamrul
    Zein-Sabatto, Saleh
    Chimba, Deo
    Hong, Liang
    Ahmed, Imtiaz
    Islam, Tariqul
    SOUTHEASTCON 2024, 2024, : 1534 - 1539
  • [29] Privacy-preserving recommendation systems for consumer healthcare services
    Katzenbeisser, Stefan
    Petkovic, Milan
    ARES 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON AVAILABILITY, SECURITY AND RELIABILITY, 2008, : 889 - 895
  • [30] PPAMH: A Novel Privacy-Preserving Approach for Mobile Healthcare
    Sadki, Souad
    El Bakkali, Hanan
    2014 9TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2014, : 209 - 214