Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study

被引:3
|
作者
Syleouni, Maria-Eleni [1 ,2 ]
Karavasiloglou, Nena [1 ,3 ]
Manduchi, Laura [4 ]
Wanner, Miriam [2 ]
Korol, Dimitri [2 ]
Ortelli, Laura [5 ]
Bordoni, Andrea [5 ]
Rohrmann, Sabine [1 ,2 ,6 ]
机构
[1] Univ Zurich, Epidemiol Biostat & Prevent Inst, Div Chron Dis Epidemiol, Zurich, Switzerland
[2] Univ Hosp Zurich, Canc Registry Zurich Zug Schaffhausen & Schwyz, Zurich, Switzerland
[3] European Food Safety Author, Parma, Italy
[4] Swiss Fed Inst Technol, Med Data Sci, Zurich, Switzerland
[5] Ticino Canc Registry, Publ Hlth Div Canton Ticino, Locarno, Switzerland
[6] Univ Zurich, Epidemiol Biostat & Prevent Inst, Hirschengraben 84, CH-8001 Zurich, Switzerland
关键词
breast cancer; cancer registry; machine learning; prediction; second cancer; RISK-FACTORS; LOCAL RECURRENCE; PROGNOSIS;
D O I
10.1002/ijc.34568
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it.
引用
收藏
页码:932 / 941
页数:10
相关论文
共 50 条
  • [31] Using Machine Learning Algorithms for Breast Cancer Diagnosis
    El-Lamey, Mazen Mobtasem
    Eid, Mohab Mohammed
    Gamal, Muhammad
    Bishady, Nour-Elhoda Mohamed
    Mohamed, Ali Wagdy
    INTERNATIONAL JOURNAL OF APPLIED METAHEURISTIC COMPUTING, 2021, 12 (04) : 117 - 154
  • [32] Breast Cancer Detection Using Machine Learning Algorithms
    Sharma, Shubham
    Aggarwal, Archit
    Choudhury, Tanupriya
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON COMPUTATIONAL TECHNIQUES, ELECTRONICS AND MECHANICAL SYSTEMS (CTEMS), 2018, : 114 - 118
  • [33] Comparative Study of Machine Learning Algorithms using a Breast Cancer Dataset
    El-Shair, Zaid A.
    Sanchez-Perez, Luis A.
    Rawashdeh, Samir A.
    2020 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2020, : 500 - 508
  • [34] Occupation and breast cancer risk among Shanghai women in a population-based cohort study
    Ji, Bu-Tian
    Blair, Aaron
    Shu, Xiao-Ou
    Chow, Wong-Ho
    Hauptmann, Michael
    Dosemeci, Mutafa
    Yang, Gong
    Lubin, Jay
    Gao, Yu-Tang
    Rothman, Nathaniel
    Zheng, Wei
    AMERICAN JOURNAL OF INDUSTRIAL MEDICINE, 2008, 51 (02) : 100 - 110
  • [35] Patterns of specialist consultations among older women with breast cancer: A population-based study
    Ko, Gary
    Hallet, Julie
    Chan, Wing
    Coburn, Natalie
    Wright, Frances
    Hong, Nicole Look
    ANNALS OF SURGICAL ONCOLOGY, 2022, 29 (SUPPL 1) : 73 - 74
  • [36] Trends and determinants of breast cancer survival among unscreened women: A population-based study
    Rapiti, Elisabetta
    Agoritsas, Thomas
    Usel, Massimo
    Schaffar, Robin
    Schubert, Hyma
    Bouchardy, Christine
    CANCER RESEARCH, 2015, 75
  • [37] Development and Validation of Nomograms for Predicting Overall and Breast Cancer-Specific Survival in Young Women with Breast Cancer: A Population-Based Study
    Gong, Yue
    Ji, Peng
    Sun, Wei
    Jiang, Yi-Zhou
    Hu, Xin
    Shao, Zhi-Ming
    TRANSLATIONAL ONCOLOGY, 2018, 11 (06): : 1334 - 1342
  • [38] Predicting and Classifying Breast Cancer Using Machine Learning
    Alkhathlan, Lina
    Saudagar, Abdul Khader Jilani
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2022, 29 (06) : 497 - 514
  • [39] Increased risk for second primary malignancies in women with breast cancer diagnosed at young age: A population-based study in Taiwan
    Lee, Kuan-Der
    Chen, Shin-Cheh
    Chan, Chunghuang Hubert
    Lu, Chang-Hsien
    Chen, Chih-Cheng
    Lin, Jen-Tsun
    Chen, Miao-Fen
    Huang, Shih-Hao
    Yeh, Chun-Ming
    Chen, Min-Chi
    CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2008, 17 (10) : 2647 - 2655
  • [40] Nonmetastatic breast cancer patients subsequently developing second primary malignancy: A population-based study
    Bao, Shengnan
    Jiang, Mengping
    Wang, Xi
    Hua, Yijia
    Zeng, Tianyu
    Yang, Yiqi
    Yang, Fan
    Yan, Xueqi
    Sun, Chunxiao
    Yang, Mengzhu
    Fu, Ziyi
    Huang, Xiang
    Li, Jun
    Wu, Hao
    Li, Wei
    Tang, Jinhai
    Yin, Yongmei
    CANCER MEDICINE, 2021, 10 (23): : 8662 - 8672