Development and validation of explainable machine-learning models for carotid atherosclerosis early screening

被引：5

作者：

Yun, Ke ^{[1
,2
]}

He, Tao ^{[3
]}

Zhen, Shi ^{[4
]}

Quan, Meihui ^{[1
,2
]}

Yang, Xiaotao ^{[1
,2
]}

Man, Dongliang ^{[1
,2
]}

Zhang, Shuang ^{[1
,2
]}

Wang, Wei ^{[5
]}

Han, Xiaoxu ^{[1
,2
,6
,7
]}

机构：

[1] China Med Univ, Affiliated Hosp 1, Natl Clin Res Ctr Lab Med, Shenyang, Liaoning, Peoples R China

[2] China Med Univ, Affiliated Hosp 1, Dept Lab Med, Shenyang, Liaoning, Peoples R China

[3] Neusoft Corp, Neusoft Res Inst, Shenyang, Liaoning, Peoples R China

[4] Northeastern Univ, Dept Software Engn, Shenyang, Liaoning, Peoples R China

[5] China Med Univ, Affiliated Hosp 1, Dept Phys Examinat Ctr, Shenyang, Liaoning, Peoples R China

[6] Chinese Acad Med Sci, Lab Med Innovat Unit, Shenyang, Liaoning, Peoples R China

[7] China Med Univ, Affiliated Hosp 1, NHC Key Lab AIDS Immunol, Shenyang, Liaoning, Peoples R China

来源：

JOURNAL OF TRANSLATIONAL MEDICINE | 2023年 / 21卷 / 01期

关键词：

Machine learning; Carotid atherosclerosis; Explainable model; CHINESE ADULTS; RISK-FACTORS; PREVALENCE; ULTRASOUND; BURDEN; AGE; GENDER;

D O I：

10.1186/s12967-023-04093-8

中图分类号：

R-3 [医学研究方法]; R3 [基础医学];

学科分类号：

1001 ;

摘要：

BackgroundCarotid atherosclerosis (CAS), an important factor in the development of stroke, is a major public health concern. The aim of this study was to establish and validate machine learning (ML) models for early screening of CAS using routine health check-up indicators in northeast China.MethodsA total of 69,601 health check-up records from the health examination center of the First Hospital of China Medical University (Shenyang, China) were collected between 2018 and 2019. For the 2019 records, 80% were assigned to the training set and 20% to the testing set. The 2018 records were used as the external validation dataset. Ten ML algorithms, including decision tree (DT), K-nearest neighbors (KNN), logistic regression (LR), naive Bayes (NB), random forest (RF), multiplayer perceptron (MLP), extreme gradient boosting machine (XGB), gradient boosting decision tree (GBDT), linear support vector machine (SVM-linear), and non-linear support vector machine (SVM-nonlinear), were used to construct CAS screening models. The area under the receiver operating characteristic curve (auROC) and precision-recall curve (auPR) were used as measures of model performance. The SHapley Additive exPlanations (SHAP) method was used to demonstrate the interpretability of the optimal model.ResultsA total of 6315 records of patients undergoing carotid ultrasonography were collected; of these, 1632, 407, and 1141 patients were diagnosed with CAS in the training, internal validation, and external validation datasets, respectively. The GBDT model achieved the highest performance metrics with auROC of 0.860 (95% CI 0.839-0.880) in the internal validation dataset and 0.851 (95% CI 0.837-0.863) in the external validation dataset. Individuals with diabetes or those over 65 years of age showed low negative predictive value. In the interpretability analysis, age was the most important factor influencing the performance of the GBDT model, followed by sex and non-high-density lipoprotein cholesterol.ConclusionsThe ML models developed could provide good performance for CAS identification using routine health check-up indicators and could hopefully be applied in scenarios without ethnic and geographic heterogeneity for CAS prevention.

引用

页数：13

共 50 条

[41] Machine-Learning Studies on Spin Models
Shiina, Kenta
Mori, Hiroyuki
Okabe, Yutaka
Lee, Hwee Kuan
SCIENTIFIC REPORTS, 2020, 10 (01)
[42] Development and Validation of Prediction Models for Perioperative Opioid Requirements: Integrating Machine-Learning Approach with Conventional Regression Method
Huang, Yongmei
Martins, Silvia
Jacobson, Judith S.
Li, Guohua
Wright, Jason D.
DRUG AND ALCOHOL DEPENDENCE, 2025, 267 : 50 - 51
[43] Machine-Learning Studies on Spin Models
Kenta Shiina
Hiroyuki Mori
Yutaka Okabe
Hwee Kuan Lee
Scientific Reports, 10
[44] Development and international validation of logistic regression and machine-learning models for the prediction of 10-year molar loss
Troiano, Giuseppe
Nibali, Luigi
Petsos, Hari
Eickholz, Peter
Saleh, Muhammad H. A.
Santamaria, Pasquale
Jian, Jao
Shi, Shuwen
Meng, Huanxin
Zhurakivska, Khrystyna
Wang, Hom-Lay
Ravida, Andrea
JOURNAL OF CLINICAL PERIODONTOLOGY, 2023, 50 (03) : 348 - 357
[45] Development and internal validation of machine-learning models for predicting survival in patients who underwent surgery for spinal metastases
Santipas, Borriwat
Veerakanjana, Kanyakorn
Ittichaiwong, Piyalitt
Chavalparit, Piya
Wilartratsami, Sirichai
Luksanapruksa, Panya
ASIAN SPINE JOURNAL, 2024, 18 (03) : 325 - 335
[46] Screening membraneless organelle participants with machine-learning models that integrate multimodal features
Chen, Zhaoming
Hou, Chao
Wang, Liang
Yu, Chunyu
Chen, Taoyu
Shen, Boyan
Hou, Yaoyao
Li, Pilong
Li, Tingting
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2022, 119 (24)
[47] Implementing Explainable Machine Learning Models for Practical Prediction of Early Neonatal Hypoglycemia
Wang, Lin-Yu
Wang, Lin-Yen
Sung, Mei-, I
Lin, I-Chun
Liu, Chung-Feng
Chen, Chia-Jung
DIAGNOSTICS, 2024, 14 (14)
[48] Evaluating Explainable Machine Learning Models for Clinicians
Scarpato, Noemi
Nourbakhsh, Aria
Ferroni, Patrizia
Riondino, Silvia
Roselli, Mario
Fallucchi, Francesca
Barbanti, Piero
Guadagni, Fiorella
Zanzotto, Fabio Massimo
COGNITIVE COMPUTATION, 2024, 16 (04) : 1436 - 1446
[49] Explainable inflation forecasts by machine learning models
Aras, Serkan
Lisboa, Paulo J. G.
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
[50] Development and validation of a machine-learning algorithm to predict the relevance of scientific articles in teratology
de Vriesa, Loes C.
Habets, Philippe C.
van IJzendoorn, David G. P.
Vinkers, Christiaan H.
Otte, Willem M.
Harmark, Linda
NEUROTOXICOLOGY AND TERATOLOGY, 2022, 92

← 1 2 3 4 5 →