Distributed learning on 20 000+lung cancer patients - The Personal Health Train

被引:89
|
作者
Deist, Timo M. [1 ,2 ]
Dankers, Frank J. W. M. [1 ,3 ]
Ojha, Priyanka [4 ]
Marshall, M. Scott [4 ]
Janssen, Tomas [4 ]
Faivre-Finn, Corinne [5 ]
Masciocchi, Carlotta [7 ]
Valentini, Vincenzo [6 ,7 ]
Wang, Jiazhou [8 ]
Chen, Jiayan [8 ]
Zhang, Zhen [8 ]
Spezi, Emiliano [9 ,10 ]
Button, Mick [10 ]
Nuyttens, Joost Jan [1 ,11 ]
Vernhout, Rene [11 ]
van Soest, Johan
Jochems, Arthur [2 ]
Monshouwer, Rene [3 ]
Bussink, Johan [3 ]
Price, Gareth [5 ]
Lambin, Philippe [2 ]
Dekker, Andre [1 ]
机构
[1] Maastricht Univ Med Ctr, GROW Sch Oncol & Dev Biol, Dept Radiat Oncol MAASTRO, Maastricht, Netherlands
[2] Maastricht Univ Med Ctr, GROW Sch Oncol & Dev Biol, D Lab Dept Precis Med, Maastricht, Netherlands
[3] Radboud Univ Nijmegen, Med Ctr, Dept Radiat Oncol, Nijmegen, Netherlands
[4] Netherlands Canc Inst Antoni van Leeuwenhoek, Dept Radiat Oncol, Amsterdam, Netherlands
[5] Univ Manchester, Manchester Acad Hlth Sci Ctr, Christie NHS Fdn Trust, Manchester, Lancs, England
[6] Univ Cattolica Sacro Cuore, Milan, Italy
[7] Fdn Policlin Univ A Gemelli IRCCS, Rome, Italy
[8] Fudan Univ, Shanghai Canc Ctr, Dept Radiat Oncol, Dept Oncol,Shanghai Med Coll, Shanghai, Peoples R China
[9] Cardiff Univ, Sch Engn, Cardiff, Wales
[10] Velindre Canc Ctr, Cardiff, Wales
[11] Erasmus MC, Canc Inst, Dept Radiat Oncol, Rotterdam, Netherlands
基金
欧盟地平线“2020”;
关键词
Lung cancer; Big data; Distributed learning; Federated learning; Machine learning; Survival analysis; Prediction modeling; FAIR data; CARE;
D O I
10.1016/j.radonc.2019.11.019
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background and purpose: Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute. Materials and methods: Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots. Results: In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015. Conclusion: The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy. (C) 2019 The Authors. Published by Elsevier B.V.
引用
收藏
页码:189 / 200
页数:12
相关论文
共 50 条
  • [1] Distributed learning on 20 000+lung cancer patients
    Deist, T.
    Dankers, F. J. W. M.
    Ojha, P.
    Marshall, S.
    Janssen, T.
    Faivre-Finn, C.
    Masciocchi, C.
    Valentini, V.
    Wang, J.
    Chen, J.
    Zhang, Z.
    Spezi, E.
    Button, M.
    Nuyttens, J. J.
    Vernhout, R.
    Van Soest, J.
    Jochems, A.
    Monshouwer, R.
    Bussink, J.
    Price, G.
    Lambin, P.
    Dekker, A.
    RADIOTHERAPY AND ONCOLOGY, 2019, 133 : S287 - S288
  • [2] Distributed Analytics on Sensitive Medical Data: The Personal Health Train
    Beyan, Oya
    Choudhury, Ananya
    van Soest, Johan
    Kohlbacher, Oliver
    Zimmermann, Lukas
    Stenzhorn, Holger
    Karim, Md Rezaul
    Dumontier, Michel
    Decker, Stefan
    Santos, Luiz Olavo Bonino da Silva
    Dekker, Andre
    DATA INTELLIGENCE, 2020, 2 (1-2) : 96 - 107
  • [3] Distributed Analytics on Sensitive Medical Data: The Personal Health Train
    Oya Beyan
    Ananya Choudhury
    Johan van Soest
    Oliver Kohlbacher
    Lukas Zimmermann
    Holger Stenzhorn
    MdRezaul Karim
    Michel Dumontier
    Stefan Decker
    Luiz Olavo Bonino da Silva Santos
    Andre Dekker
    Data Intelligence, 2020, 2(Z1) (Z1) : 96 - 107+305
  • [4] Distributed radiomics as a signature validation study using the Personal Health Train infrastructure
    Shi, Zhenwei
    Zhovannik, Ivan
    Traverso, Alberto
    Dankers, Frank J. W. M.
    Deist, Timo M.
    Kalendralis, Petros
    Monshouwer, Rene
    Bussink, Johan
    Fijten, Rianne
    Aerts, Hugo J. W. L.
    Dekker, Andre
    Wee, Leonard
    SCIENTIFIC DATA, 2019, 6 (1)
  • [5] Distributed radiomics as a signature validation study using the Personal Health Train infrastructure
    Zhenwei Shi
    Ivan Zhovannik
    Alberto Traverso
    Frank J. W. M. Dankers
    Timo M. Deist
    Petros Kalendralis
    René Monshouwer
    Johan Bussink
    Rianne Fijten
    Hugo J. W. L. Aerts
    Andre Dekker
    Leonard Wee
    Scientific Data, 6
  • [6] ANALYSIS OF 20,000 CASES OF SUSPECTED LUNG-CANCER
    HOPPE, R
    PRAXIS UND KLINIK DER PNEUMOLOGIE, 1977, 31 (10): : 872 - 884
  • [7] Colorectal cancer health and care quality indicators in a federated setting using the Personal Health Train
    Choudhury, Ananya
    Janssen, Esther
    Bongers, Bart C.
    van Meeteren, Nico L. U.
    Dekker, Andre
    van Soest, Johan
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [8] Stratification of Lung Cancer Risk From Personal Health Data
    Hart, G. R.
    Roffman, D. A.
    Decker, R.
    Deng, J.
    MEDICAL PHYSICS, 2018, 45 (06) : E446 - E447
  • [9] Lung Cancer Screening in Patients with Versus Without a Personal History of Cancer
    Rivera, M. P.
    Vandyk, M.
    Taylor, T.
    Long, J.
    Sites, S.
    Bearden, C.
    Alston-Johnson, D.
    Ciociola, E.
    Henderson, L.
    JOURNAL OF THORACIC ONCOLOGY, 2019, 14 (11) : S1129 - S1129
  • [10] Heterogeneity in willingness to share personal health information: a nationwide cluster analysis of 20,000 adults in Japan
    Miho Sassa
    Akifumi Eguchi
    Keiko Maruyama-Sakurai
    Takanori Fujita
    Yumi Kawamura
    Takayuki Kawashima
    Yuta Tanoue
    Daisuke Yoneoka
    Hiroaki Miyata
    Takanori Yamashita
    Naoki Nakashima
    Shuhei Nomura
    Archives of Public Health, 83 (1)