A web-based tool for cancer risk prediction for middle-aged and elderly adults using machine learning algorithms and self-reported questions

被引:0
|
作者
Xiao, Xingjian [1 ]
Yi, Xiaohan [1 ]
Soe, Nyi Nyi [2 ,3 ]
Latt, Phyu Mon [2 ,3 ]
Lin, Luotao [4 ]
Chen, Xuefen [1 ]
Song, Hualing [1 ]
Sun, Bo [5 ]
Zhao, Hailei [1 ]
Xu, Xianglong [1 ,2 ,3 ,6 ,7 ]
机构
[1] Shanghai Univ Tradit Chinese Med, Sch Publ Hlth, Shanghai, Peoples R China
[2] Monash Univ, Fac Med Nursing & Hlth Sci, Sch Translat Med, Clayton, Vic, Australia
[3] Alfred Hlth, Melbourne Sexual Hlth Ctr, Artificial Intelligence & Modelling Epidemiol Prog, Carlton, Vic, Australia
[4] Univ New Mexico, Dept Individual Family & Community Educ, Nutr & Dietet Program, Albuquerque, NM USA
[5] Shanghai Univ Tradit Chinese Med, LongHua Hosp, Endoscopy Ctr, Shanghai, Peoples R China
[6] Shanghai Univ Tradit Chinese Med, Bijie Inst, Bijie, Peoples R China
[7] Bijie Dist Ctr Dis Control & Prevent, Doctoral Workstat, Bijie, Peoples R China
关键词
Cancer; Pan-cancer; Prediction; Web-based; Risk; Co-management; Co-prevention; Middle-aged; China; Machine learning; HEALTH;
D O I
10.1016/j.annepidem.2024.12.003
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background: From a global perspective, China is one of the countries with higher incidence and mortality rates for cancer. Objective: Our objective is to create an online cancer risk prediction tool for middle-aged and elderly Chinese adults by leveraging machine learning algorithms and self-reported data. Method: Drawing from a cohort of 19,798 participants aged 45 and above from the China Health and Retirement Longitudinal Study (2011 - 2018), we employed nine machine learning algorithms (LR: Logistic Regression, Adaboost: Adaptive Boosting, SVM: Support Vector Machine, RF: Random Forest, GNB: Gaussian Naive Bayes, GBM: Gradient Boosting Machine, LGBM: Light Gradient Boosting Machine, XGBoost: eXtreme Gradient Boosting, KNN: K - Nearest Neighbors), which are mainly used for classification and regression tasks, to construct predictive models for various cancers. Utilizing non-invasive self-reported predictors encompassing demographic, educational, marital, lifestyle, health history, and other factors, we focused on predicting "Cancer or Malignant Tumour" outcomes. The types of cancers that can be predicted mainly include lung cancer, breast cancer, cervical cancer, colorectal cancer, gastric cancer, esophageal cancer, and other rare cancers. Results: The developed tool, MyCancerRisk, demonstrated significant performance, with the Random Forest algorithm achieving an AUC of 0.75 and ACC of 0.99 using self-reported variables. Key predictors identified include age, self-rated health, sleep patterns, household heating sources, childhood health status, living conditions, and smoking habits. Conclusion: MyCancerRisk aims to serve as a preventative screening tool, encouraging individuals to undergo testing and adopt healthier behaviours to mitigate the public health impact of cancer. Our study also sheds light on unconventional predictors, such as housing conditions, offering valuable insights for refining cancer prediction models.
引用
收藏
页码:27 / 35
页数:9
相关论文
共 48 条
  • [31] Data-driven prediction of electrospun nanofiber diameter using machine learning: A comprehensive study and web-based tool development
    Sukpancharoen, Somboon
    Wijakmatee, Thossaporn
    Katongtung, Tossapon
    Ponhan, Kowit
    Rattanachoung, Nopporn
    Khojitmate, Sujira
    RESULTS IN ENGINEERING, 2024, 24
  • [32] Association between self-reported sleep duration and serum lipid profile in a middle-aged and elderly population in Taiwan: a community-based, cross-sectional study
    Lin, Pu
    Chang, Kai-Ting
    Lin, Yan-An
    Tzeng, I-Shiang
    Chuang, Hai-Hua
    Chen, Jau-Yuan
    BMJ OPEN, 2017, 7 (10):
  • [33] The Relationship between Self-Reported Sitting Time and Vitamin D Levels in Middle-Aged and Elderly Taiwanese Population: A Community-Based Cross-Sectional Study
    Chang, Yu-Hsuan
    Lin, Chun-Ru
    Shih, Yu-Lin
    Shih, Chin-Chuan
    Chen, Jau-Yuan
    NUTRIENTS, 2023, 15 (22)
  • [34] A Web-Based Prediction Model for Cancer-Specific Survival of Middle-Aged Patients With Non-metastatic Renal Cell Carcinoma: A Population-Based Study
    Tang, Jie
    Wang, Jinkui
    Pan, Xiudan
    Liu, Xiaozhu
    Zhao, Binyi
    FRONTIERS IN PUBLIC HEALTH, 2022, 10
  • [35] The association between polluting fuel use and self-reported insomnia among middle-aged and elderly Indian adults: Cross-sectional findings from the longitudinal ageing study in India
    Leng, Siqi
    Jin, Yuming
    Tang, Xiangdong
    JOURNAL OF THE NEUROLOGICAL SCIENCES, 2023, 455
  • [36] Using machine learning-based algorithms to construct cardiovascular risk prediction models for Taiwanese adults based on traditional and novel risk factors
    Cheng, Chien-Hsiang
    Lee, Bor-Jen
    Nfor, Oswald Ndi
    Hsiao, Chih-Hsuan
    Huang, Yi-Chia
    Liaw, Yung-Po
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [37] Using machine learning models to identify the risk of depression in middle-aged and older adults with frequent and infrequent nicotine use: A cross-sectional study
    Qiu, Yuran
    Ma, Xu
    JOURNAL OF AFFECTIVE DISORDERS, 2024, 367 : 554 - 561
  • [38] Characterisation of cardiovascular disease (CVD) incidence and machine learning risk prediction in middle-aged and elderly populations: data from the China health and retirement longitudinal study (CHARLS)
    Huang, Qing
    Jiang, Zihao
    Shi, Bo
    Meng, Jiaxu
    Shu, Li
    Hu, Fuyong
    Mi, Jing
    BMC PUBLIC HEALTH, 2025, 25 (01)
  • [39] Alcohol consumption and other risk factors for self-reported diabetes among middle-aged Japanese: a population-based prospective study in the JPHC study cohort I
    Waki, K
    Noda, M
    Sasaki, S
    Matsumura, Y
    Takahashi, Y
    Isogawa, A
    Ohashi, Y
    Kadowaki, T
    Tsugane, S
    DIABETIC MEDICINE, 2005, 22 (03) : 323 - 331
  • [40] Comparison of a web-based food record tool and a food-frequency questionnaire and objective validation using the doubly labelled water technique in a Swedish middle-aged population
    Nybacka, Sanna
    Forslund, Helene Berteus
    Wirfalt, Elisabet
    Larsson, Ingrid
    Ericson, Ulrika
    Lemming, Eva Warensjo
    Bergstrom, Goran
    Hedblad, Bo
    Winkvist, Anna
    Lindroos, Anna Karin
    JOURNAL OF NUTRITIONAL SCIENCE, 2016, 5