An explainable artificial intelligence framework for risk prediction of COPD in smokers

被引:4
|
作者
Wang, Xuchun [1 ]
Qiao, Yuchao [1 ]
Cui, Yu [1 ]
Ren, Hao [1 ]
Zhao, Ying [2 ]
Linghu, Liqin [1 ,2 ]
Ren, Jiahui [1 ]
Zhao, Zhiyang [1 ]
Chen, Limin [3 ]
Qiu, Lixia [1 ]
机构
[1] Shanxi Med Univ, Sch Publ Hlth, Dept Hlth Stat, 56 South XinJian Rd, Taiyuan 030001, Peoples R China
[2] Shanxi Ctr Dis Control & Prevent, Taiyuan 030012, Shanxi, Peoples R China
[3] Shanxi Med Univ, Hosp 5, Shanxi Peoples Hosp, Taiyuan 030012, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
COPD; Machine learning; Class imbalance; Prediction; Smokers; OBSTRUCTIVE PULMONARY-DISEASE; NEVER SMOKERS; DIAGNOSIS; CLASSIFICATION; SPIROMETRY; ASTHMA;
D O I
10.1186/s12889-023-17011-w
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
BackgroundSince the inconspicuous nature of early signs associated with Chronic Obstructive Pulmonary Disease (COPD), individuals often remain unidentified, leading to suboptimal opportunities for timely prevention and treatment. The purpose of this study was to create an explainable artificial intelligence framework combining data preprocessing methods, machine learning methods, and model interpretability methods to identify people at high risk of COPD in the smoking population and to provide a reasonable interpretation of model predictions.MethodsThe data comprised questionnaire information, physical examination data and results of pulmonary function tests before and after bronchodilatation. First, the factorial analysis for mixed data (FAMD), Boruta and NRSBoundary-SMOTE resampling methods were used to solve the missing data, high dimensionality and category imbalance problems. Then, seven classification models (CatBoost, NGBoost, XGBoost, LightGBM, random forest, SVM and logistic regression) were applied to model the risk level, and the best machine learning (ML) model's decisions were explained using the Shapley additive explanations (SHAP) method and partial dependence plot (PDP).ResultsIn the smoking population, age and 14 other variables were significant factors for predicting COPD. The CatBoost, random forest, and logistic regression models performed reasonably well in unbalanced datasets. CatBoost with NRSBoundary-SMOTE had the best classification performance in balanced datasets when composite indicators (the AUC, F1-score, and G-mean) were used as model comparison criteria. Age, COPD Assessment Test (CAT) score, gross annual income, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), anhelation, respiratory disease, central obesity, use of polluting fuel for household heating, region, use of polluting fuel for household cooking, and wheezing were important factors for predicting COPD in the smoking population.ConclusionThis study combined feature screening methods, unbalanced data processing methods, and advanced machine learning methods to enable early identification of COPD risk groups in the smoking population. COPD risk factors in the smoking population were identified using SHAP and PDP, with the goal of providing theoretical support for targeted screening strategies and smoking population self-management strategies.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] An explainable artificial intelligence framework for risk prediction of COPD in smokers
    Xuchun Wang
    Yuchao Qiao
    Yu Cui
    Hao Ren
    Ying Zhao
    Liqin Linghu
    Jiahui Ren
    Zhiyang Zhao
    Limin Chen
    Lixia Qiu
    BMC Public Health, 23
  • [2] An Explainable Artificial Intelligence Framework for the Deterioration Risk Prediction of Hepatitis Patients
    Junfeng Peng
    Kaiqiang Zou
    Mi Zhou
    Yi Teng
    Xiongyong Zhu
    Feifei Zhang
    Jun Xu
    Journal of Medical Systems, 2021, 45
  • [3] An Explainable Artificial Intelligence Framework for the Deterioration Risk Prediction of Hepatitis Patients
    Peng, Junfeng
    Zou, Kaiqiang
    Zhou, Mi
    Teng, Yi
    Zhu, Xiongyong
    Zhang, Feifei
    Xu, Jun
    JOURNAL OF MEDICAL SYSTEMS, 2021, 45 (05)
  • [4] EXPLAINABLE ARTIFICIAL INTELLIGENCE FOR EARLY PREDICTION OF PRESSURE INJURY RISK
    Alderden, Jenny
    Johnny, Jace
    Brooks, Katie R.
    Wilson, Andrew
    Yap, Tracey L.
    Zhao, Yunchuan
    van der Laan, Mark
    Kennerly, Susan
    AMERICAN JOURNAL OF CRITICAL CARE, 2024, 33 (05) : 373 - 381
  • [5] An Expandable Yield Prediction Framework Using Explainable Artificial Intelligence for Semiconductor Manufacturing
    Lee, Youjin
    Roh, Yonghan
    APPLIED SCIENCES-BASEL, 2023, 13 (04):
  • [6] Explainable Artificial Intelligence Based Framework for Non-Communicable Diseases Prediction
    Davagdorj, Khishigsuren
    Bae, Jang-Whan
    Pham, Van-Huy
    Theera-Umpon, Nipon
    Ryu, Keun Ho
    IEEE ACCESS, 2021, 9 : 123672 - 123688
  • [7] SeXAI: A Semantic Explainable Artificial Intelligence Framework
    Donadello, Ivan
    Dragoni, Mauro
    AIXIA 2020 - ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 12414 : 51 - 66
  • [8] AN EXPLAINABLE ARTIFICIAL INTELLIGENCE MODEL FOR PREDICTION OF HIGH-RISK NONALCOHOLIC STEATOHEPATITIS
    Njei, Basile
    Osta, Eri G.
    Njei, Nelvis
    Lim, Joseph K.
    GASTROENTEROLOGY, 2023, 164 (06) : S1287 - S1288
  • [9] Risk Prediction of Cardiovascular Events by Exploration of Molecular Data with Explainable Artificial Intelligence
    Westerlund, Annie M.
    Hawe, Johann S.
    Heinig, Matthias
    Schunkert, Heribert
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (19)
  • [10] Explainable artificial intelligence framework for FRP composites design
    Yossef, Mostafa
    Noureldin, Mohamed
    Alqabbany, Aghyad
    COMPOSITE STRUCTURES, 2024, 341