An explainable artificial intelligence framework for risk prediction of COPD in smokers

被引：4

作者：

Wang, Xuchun ^{[1
]}

Qiao, Yuchao ^{[1
]}

Cui, Yu ^{[1
]}

Ren, Hao ^{[1
]}

Zhao, Ying ^{[2
]}

Linghu, Liqin ^{[1
,2
]}

Ren, Jiahui ^{[1
]}

Zhao, Zhiyang ^{[1
]}

Chen, Limin ^{[3
]}

Qiu, Lixia ^{[1
]}

机构：

[1] Shanxi Med Univ, Sch Publ Hlth, Dept Hlth Stat, 56 South XinJian Rd, Taiyuan 030001, Peoples R China

[2] Shanxi Ctr Dis Control & Prevent, Taiyuan 030012, Shanxi, Peoples R China

[3] Shanxi Med Univ, Hosp 5, Shanxi Peoples Hosp, Taiyuan 030012, Shanxi, Peoples R China

来源：

BMC PUBLIC HEALTH | 2023年 / 23卷 / 01期

基金：

中国国家自然科学基金;

关键词：

COPD; Machine learning; Class imbalance; Prediction; Smokers; OBSTRUCTIVE PULMONARY-DISEASE; NEVER SMOKERS; DIAGNOSIS; CLASSIFICATION; SPIROMETRY; ASTHMA;

D O I：

10.1186/s12889-023-17011-w

中图分类号：

R1 [预防医学、卫生学];

学科分类号：

1004 ; 120402 ;

摘要：

BackgroundSince the inconspicuous nature of early signs associated with Chronic Obstructive Pulmonary Disease (COPD), individuals often remain unidentified, leading to suboptimal opportunities for timely prevention and treatment. The purpose of this study was to create an explainable artificial intelligence framework combining data preprocessing methods, machine learning methods, and model interpretability methods to identify people at high risk of COPD in the smoking population and to provide a reasonable interpretation of model predictions.MethodsThe data comprised questionnaire information, physical examination data and results of pulmonary function tests before and after bronchodilatation. First, the factorial analysis for mixed data (FAMD), Boruta and NRSBoundary-SMOTE resampling methods were used to solve the missing data, high dimensionality and category imbalance problems. Then, seven classification models (CatBoost, NGBoost, XGBoost, LightGBM, random forest, SVM and logistic regression) were applied to model the risk level, and the best machine learning (ML) model's decisions were explained using the Shapley additive explanations (SHAP) method and partial dependence plot (PDP).ResultsIn the smoking population, age and 14 other variables were significant factors for predicting COPD. The CatBoost, random forest, and logistic regression models performed reasonably well in unbalanced datasets. CatBoost with NRSBoundary-SMOTE had the best classification performance in balanced datasets when composite indicators (the AUC, F1-score, and G-mean) were used as model comparison criteria. Age, COPD Assessment Test (CAT) score, gross annual income, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), anhelation, respiratory disease, central obesity, use of polluting fuel for household heating, region, use of polluting fuel for household cooking, and wheezing were important factors for predicting COPD in the smoking population.ConclusionThis study combined feature screening methods, unbalanced data processing methods, and advanced machine learning methods to enable early identification of COPD risk groups in the smoking population. COPD risk factors in the smoking population were identified using SHAP and PDP, with the goal of providing theoretical support for targeted screening strategies and smoking population self-management strategies.

引用

页数：18

共 50 条

[41] A novel model usability evaluation framework (MUsE) for explainable artificial intelligence
Dieber, Jürgen
Kirrane, Sabrina
Information Fusion, 2022, 81 : 143 - 153
[42] Analysis and evaluation of explainable artificial intelligence on suicide risk assessment
Hao Tang
Aref Miri Rekavandi
Dharjinder Rooprai
Girish Dwivedi
Frank M. Sanfilippo
Farid Boussaid
Mohammed Bennamoun
Scientific Reports, 14
[43] A Multi-Component Framework for the Analysis and Design of Explainable Artificial Intelligence
Kim, Mi-Young
Atakishiyev, Shahin
Babiker, Housam Khalifa Bashier
Farruque, Nawshad
Goebel, Randy
Zaiane, Osmar R.
Motallebi, Mohammad-Hossein
Rabelo, Juliano
Syed, Talat
Yao, Hengshuai
Chun, Peter
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2021, 3 (04): : 900 - 921
[44] An Operational Framework for Guiding Human Evaluation in Explainable and Trustworthy Artificial Intelligence
Confalonieri, Roberto
Alonso-Moral, Jose Maria
IEEE INTELLIGENT SYSTEMS, 2024, 39 (01) : 18 - 28
[45] A novel model usability evaluation framework (MUsE) for explainable artificial intelligence
Dieber, Juergen
Kirrane, Sabrina
INFORMATION FUSION, 2022, 81 : 143 - 153
[46] Solving the Black Box Problem: A Normative Framework for Explainable Artificial Intelligence
Zednik C.
Philosophy & Technology, 2021, 34 (2) : 265 - 288
[47] Understanding the dilemma of explainable artificial intelligence: a proposal for a ritual dialog framework
Bao, Aorigele
Zeng, Yi
HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2024, 11 (01):
[48] A unified and practical user-centric framework for explainable artificial intelligence
Kaplan, Sinan
Uusitalo, Hannu
Lensu, Lasse
KNOWLEDGE-BASED SYSTEMS, 2024, 283
[49] Analysis and evaluation of explainable artificial intelligence on suicide risk assessment
Tang, Hao
Miri Rekavandi, Aref
Rooprai, Dharjinder
Dwivedi, Girish
Sanfilippo, Frank M.
Boussaid, Farid
Bennamoun, Mohammed
SCIENTIFIC REPORTS, 2024, 14 (01)
[50] Explainable artificial intelligence model for mortality risk prediction in the intensive care unit: a derivation and validation study
Hu, Chang
Gao, Chao
Li, Tianlong
Liu, Chang
Peng, Zhiyong
POSTGRADUATE MEDICAL JOURNAL, 2024, 100 (1182) : 219 - 227

← 1 2 3 4 5 →