Privacy-preserving dataset combination and Lasso regression for healthcare predictions

被引:23
|
作者
van Egmond, Marie Beth [1 ]
Spini, Gabriele [1 ]
van der Galien, Onno [5 ]
IJpma, Arne [6 ]
Veugen, Thijs [1 ,3 ]
Kraaij, Wessel [1 ,2 ]
Sangers, Alex [1 ]
Rooijakkers, Thomas [1 ]
Langenkamp, Peter [1 ]
Kamphorst, Bart [1 ]
van de L'Isle, Natasja [4 ]
Kooij-Janic, Milena [1 ]
机构
[1] TNO Dutch Org Appl Sci Res, Unit ICT, The Hague, Netherlands
[2] Leiden Univ, Leiden Inst Adv Comp Sci, Leiden, Netherlands
[3] Ctr Wiskunde & Informat CWI, Cryptol Res Grp, Amsterdam, Netherlands
[4] TMC Data Sci, Eindhoven, Netherlands
[5] Achmea, Zeist, Netherlands
[6] Erasmus MC, Rotterdam, Netherlands
基金
欧盟地平线“2020”;
关键词
Secure multi-party computation; Privacy; Machine learning; Lasso regression; RIDGE-REGRESSION;
D O I
10.1186/s12911-021-01582-y
中图分类号
R-058 [];
学科分类号
摘要
Background Recent developments in machine learning have shown its potential impact for clinical use such as risk prediction, prognosis, and treatment selection. However, relevant data are often scattered across different stakeholders and their use is regulated, e.g. by GDPR or HIPAA. As a concrete use-case, hospital Erasmus MC and health insurance company Achmea have data on individuals in the city of Rotterdam, which would in theory enable them to train a regression model in order to identify high-impact lifestyle factors for heart failure. However, privacy and confidentiality concerns make it unfeasible to exchange these data. Methods This article describes a solution where vertically-partitioned synthetic data of Achmea and of Erasmus MC are combined using Secure Multi-Party Computation. First, a secure inner join protocol takes place to securely determine the identifiers of the patients that are represented in both datasets. Then, a secure Lasso Regression model is trained on the securely combined data. The involved parties thus obtain the prediction model but no further information on the input data of the other parties. Results We implement our secure solution and describe its performance and scalability: we can train a prediction model on two datasets with 5000 records each and a total of 30 features in less than one hour, with a minimal difference from the results of standard (non-secure) methods. Conclusions This article shows that it is possible to combine datasets and train a Lasso regression model on this combination in a secure way. Such a solution thus further expands the potential of privacy-preserving data analysis in the medical domain.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Privacy-preserving Decentralized Learning Framework for Healthcare System
    Kasyap, Harsh
    Tripathy, Somanath
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (02)
  • [32] Mutual Privacy-Preserving Regression Modeling in Participatory Sensing
    Xing, Kai
    Wan, Zhiguo
    Hu, Pengfei
    Zhu, Haojin
    Wang, Yuepeng
    Chen, Xi
    Wang, Yang
    Huang, Liusheng
    2013 PROCEEDINGS IEEE INFOCOM, 2013, : 3039 - 3047
  • [33] Privacy-preserving logistic regression outsourcing in cloud computing
    Zhu, Xu Dong
    Li, Hui
    Li, Feng Hua
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2013, 4 (2-3) : 144 - 150
  • [34] ACCESS CONTROL FOR PRIVACY-PRESERVING GAUSSIAN PROCESS REGRESSION
    Nakachi, Takayuki
    Wang, Yitu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4158 - 4162
  • [35] Scalability of Privacy-Preserving Linear Regression in Epidemiological Studies
    Kikuchi, Hiroaki
    Hashimoto, Hideki
    Yasunaga, Hideo
    Saito, Takamichi
    2015 IEEE 29th International Conference on Advanced Information Networking and Applications (IEEE AINA 2015), 2015, : 510 - 514
  • [36] Privacy-Preserving Ridge Regression on Hundreds of Millions of Records
    Nikolaenko, Valeria
    Weinsberg, Udi
    Ioannidis, Stratis
    Joye, Marc
    Boneh, Dan
    Taft, Nina
    2013 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2013, : 334 - 348
  • [37] A privacy-preserving cryptosystem for IoT E-healthcare
    Hamza, Rafik
    Yan, Zheng
    Muhammad, Khan
    Bellavista, Paolo
    Titouna, Faiza
    INFORMATION SCIENCES, 2020, 527 (527) : 493 - 510
  • [38] Privacy-Preserving Federated Learning Model for Healthcare Data
    Ul Islam, Tanzir
    Ghasemi, Reza
    Mohammed, Noman
    2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 281 - 287
  • [39] Privacy-preserving artificial intelligence in healthcare: Techniques and applications
    Khalid, Nazish
    Qayyum, Adnan
    Bilal, Muhammad
    Al-Fuqaha, Ala
    Qadir, Junaid
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 158
  • [40] A Privacy-Preserving Data Sharing Solution for Mobile Healthcare
    Huang, Chanying
    Yan, Kedong
    Wei, Songjie
    Lee, Dong Hoon
    PROCEEDINGS OF 2017 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC 2017), 2017, : 260 - 265