A web-based tool for cancer risk prediction for middle-aged and elderly adults using machine learning algorithms and self-reported questions

被引:0
|
作者
Xiao, Xingjian [1 ]
Yi, Xiaohan [1 ]
Soe, Nyi Nyi [2 ,3 ]
Latt, Phyu Mon [2 ,3 ]
Lin, Luotao [4 ]
Chen, Xuefen [1 ]
Song, Hualing [1 ]
Sun, Bo [5 ]
Zhao, Hailei [1 ]
Xu, Xianglong [1 ,2 ,3 ,6 ,7 ]
机构
[1] Shanghai Univ Tradit Chinese Med, Sch Publ Hlth, Shanghai, Peoples R China
[2] Monash Univ, Fac Med Nursing & Hlth Sci, Sch Translat Med, Clayton, Vic, Australia
[3] Alfred Hlth, Melbourne Sexual Hlth Ctr, Artificial Intelligence & Modelling Epidemiol Prog, Carlton, Vic, Australia
[4] Univ New Mexico, Dept Individual Family & Community Educ, Nutr & Dietet Program, Albuquerque, NM USA
[5] Shanghai Univ Tradit Chinese Med, LongHua Hosp, Endoscopy Ctr, Shanghai, Peoples R China
[6] Shanghai Univ Tradit Chinese Med, Bijie Inst, Bijie, Peoples R China
[7] Bijie Dist Ctr Dis Control & Prevent, Doctoral Workstat, Bijie, Peoples R China
关键词
Cancer; Pan-cancer; Prediction; Web-based; Risk; Co-management; Co-prevention; Middle-aged; China; Machine learning; HEALTH;
D O I
10.1016/j.annepidem.2024.12.003
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background: From a global perspective, China is one of the countries with higher incidence and mortality rates for cancer. Objective: Our objective is to create an online cancer risk prediction tool for middle-aged and elderly Chinese adults by leveraging machine learning algorithms and self-reported data. Method: Drawing from a cohort of 19,798 participants aged 45 and above from the China Health and Retirement Longitudinal Study (2011 - 2018), we employed nine machine learning algorithms (LR: Logistic Regression, Adaboost: Adaptive Boosting, SVM: Support Vector Machine, RF: Random Forest, GNB: Gaussian Naive Bayes, GBM: Gradient Boosting Machine, LGBM: Light Gradient Boosting Machine, XGBoost: eXtreme Gradient Boosting, KNN: K - Nearest Neighbors), which are mainly used for classification and regression tasks, to construct predictive models for various cancers. Utilizing non-invasive self-reported predictors encompassing demographic, educational, marital, lifestyle, health history, and other factors, we focused on predicting "Cancer or Malignant Tumour" outcomes. The types of cancers that can be predicted mainly include lung cancer, breast cancer, cervical cancer, colorectal cancer, gastric cancer, esophageal cancer, and other rare cancers. Results: The developed tool, MyCancerRisk, demonstrated significant performance, with the Random Forest algorithm achieving an AUC of 0.75 and ACC of 0.99 using self-reported variables. Key predictors identified include age, self-rated health, sleep patterns, household heating sources, childhood health status, living conditions, and smoking habits. Conclusion: MyCancerRisk aims to serve as a preventative screening tool, encouraging individuals to undergo testing and adopt healthier behaviours to mitigate the public health impact of cancer. Our study also sheds light on unconventional predictors, such as housing conditions, offering valuable insights for refining cancer prediction models.
引用
收藏
页码:27 / 35
页数:9
相关论文
共 48 条
  • [1] Determinants of Visual Impairment Among Chinese Middle-Aged and Older Adults: Risk Prediction Model Using Machine Learning Algorithms
    Mao, Lijun
    Yu, Zhen
    Lin, Luotao
    Sharma, Manoj
    Song, Hualing
    Zhao, Hailei
    Xu, Xianglong
    JMIR AGING, 2024, 7
  • [2] Prevalence and risk factors for self-reported visual impairment among middle-aged and older adults
    Horowitz, A
    Brennan, M
    Reinhardt, JP
    RESEARCH ON AGING, 2005, 27 (03) : 307 - 326
  • [3] Social participation and self-reported health in China: evidence from Chinese middle-aged and elderly adults
    Xinxin, M. A.
    INTERNATIONAL JOURNAL OF SOCIAL ECONOMICS, 2021, 48 (01) : 85 - 103
  • [4] Self-reported indications for antidepressant use in a population-based cohort of middle-aged and elderly
    Nikkie Aarts
    Raymond Noordam
    Albert Hofman
    Henning Tiemeier
    Bruno H. Stricker
    Loes E. Visser
    International Journal of Clinical Pharmacy, 2016, 38 : 1311 - 1317
  • [5] Self-reported indications for antidepressant use in a population-based cohort of middle-aged and elderly
    Aarts, Nikkie
    Noordam, Raymond
    Hofman, Albert
    Tiemeier, Henning
    Stricker, Bruno H.
    Visser, Loes E.
    INTERNATIONAL JOURNAL OF CLINICAL PHARMACY, 2016, 38 (05) : 1311 - 1317
  • [6] Prediction of depression onset risk among middle-aged and elderly adults using machine learning and Canadian Longitudinal Study on Aging cohort
    Song, Yipeng
    Qian, Lei
    Sui, Jie
    Greiner, Russell
    Li, Xin-min
    Greenshaw, Andrew J.
    Liu, Yang S.
    Cao, Bo
    JOURNAL OF AFFECTIVE DISORDERS, 2023, 339 : 52 - 57
  • [7] The role of self-reported hearing status in the risk of hospitalisation among Chinese middle-aged and older adults
    Ye, Xin
    Zhu, Dawei
    He, Ping
    INTERNATIONAL JOURNAL OF AUDIOLOGY, 2021, 60 (10) : 754 - 761
  • [8] A Web-Based Disease Prediction System Using Machine Learning Algorithms and PCA
    Khan, Anushey
    Huseyinov, Ilham
    FORTHCOMING NETWORKS AND SUSTAINABILITY IN THE AIOT ERA, VOL 1, FONES-AIOT 2024, 2024, 1035 : 104 - 112
  • [9] Cardiovascular Risk Prediction Using Machine-learning Methods in the Middle-aged Korean Population
    Kim, Hyeon Chang
    Jo, In-Jeong
    Sung, Ji Min
    Chang, Hyuk-Jae
    CIRCULATION, 2017, 135
  • [10] Web-Based Risk Prediction Tool for an Individual's Risk of HIV and Sexually Transmitted Infections Using Machine Learning Algorithms: Development and External Validation Study
    Xu, Xianglong
    Yu, Zhen
    Ge, Zongyuan
    Chow, Eric P. F.
    Bao, Yining
    Ong, Jason J.
    Li, Wei
    Wu, Jinrong
    Fairley, Christopher K.
    Zhang, Lei
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2022, 24 (08)