A Data-Driven Comparative Analysis of Machine-Learning Models for Familial Hypercholesterolemia Detection

被引:0
|
作者
Kocejko, Tomasz [1 ]
机构
[1] Gdansk Univ Technol, Fac Elect Telecommun & Informat, Dept Biomed Engn, PL-80233 Gdansk, Poland
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期
关键词
machine learning; familial hypercholesterolemia; DLCN; model ensembles; DIAGNOSIS; POPULATION;
D O I
10.3390/app142311187
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application The presented study can contribute to increasing the familial hypercholesterolemia classification and may help reduce the number of undiagnosed cases of the disease. Abstract This study presents an assessment of familial hypercholesterolemia (FH) probability using different algorithms (CatBoost, XGBoost, Random Forest, SVM) and its ensembles, leveraging electronic health record data. The primary objective is to explore an enhanced method for estimating FH probability, surpassing the currently recommended Dutch Lipid Clinic Network (DLCN) Score. The models were trained using the largest Polish cohort of patients enrolled in an FH clinic, all of whom underwent genetic testing for FH-associated mutations. The initial dataset comprised over 100 parameters per patient, which was reduced to 48 clinically accessible features to ensure applicability in routine outpatient settings. To preserve balance, the data were stratified according to DLCN score ranges (<0-2>, <3-5>, <6-8>, and >= 9), representing varying levels of FH likelihood. The dataset was then split into training and test sets with an 80/20 ratio. Machine-learning models were trained, with hyperparameters optimized via grid search. The accuracy of the DLCN score in predicting FH was first evaluated by examining the proportion of patients with positive DNA tests relative to those with a DLCN score of 6 and above, the threshold for genetic testing. The DLCN score demonstrated an accuracy of approximately 40%. In contrast, the CatBoost model and its ensembles achieved over 80% accuracy. While the DLCN score remains a clinically valuable tool, its diagnostic accuracy is limited. The findings indicate that the ML models offer a substantial improvement in the precision of FH diagnosis, demonstrating its potential to enhance clinical decision making in identifying patients with FH.
引用
收藏
页数:13
相关论文
共 50 条
  • [11] Data-driven machine-learning analysis of potential embolic sources in embolic stroke of undetermined source
    Ntaios, G.
    Weng, S. F.
    Perlepe, K.
    Akyea, R.
    Condon, L.
    Lambrou, D.
    Sirimarco, G.
    Strambo, D.
    Eskandari, A.
    Karagkiozi, E.
    Vemmou, A.
    Korompoki, E.
    Manios, E.
    Makaritsis, K.
    Vemmos, K.
    Michel, P.
    EUROPEAN JOURNAL OF NEUROLOGY, 2021, 28 (01) : 192 - 201
  • [12] Detection of hidden cases of familial hypercholesterolemia in the Netherlands: a central laboratory data-driven strategy
    Ibrahim, S.
    Nierman, M. C.
    Nurmohamed, N. S.
    Hovingh, G. K.
    Reeskamp, L. F.
    Stroes, E. S. G.
    EUROPEAN HEART JOURNAL, 2023, 44
  • [13] Probing non-Markovian quantum dynamics with data-driven analysis: Beyond "black-box" machine-learning models
    Luchnikov, I. A.
    Kiktenko, E. O.
    Gavreev, M. A.
    Ouerdane, H.
    Filippov, S. N.
    Fedorov, A. K.
    PHYSICAL REVIEW RESEARCH, 2022, 4 (04):
  • [14] Machine-Learning Techniques Assist Data-Driven Well-Performance Optimization
    Carpenter, Chris
    JPT, Journal of Petroleum Technology, 2021, 73 (10): : 63 - 64
  • [15] DATA-DRIVEN PREDICTION OF CELLULAR NETWORKS COVERAGE: AN INTERPRETABLE MACHINE-LEARNING MODEL
    Ghasemi, Amir
    2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 604 - 608
  • [16] Modeling and prediction of slug characteristics utilizing data-driven machine-learning methodology
    Kim, Tea-Woo
    Kim, Sungil
    Lim, Jung-Tek
    JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2020, 195
  • [17] Data-Driven Computational Neuroscience: Machine Learning and Statistical Models
    Kreinovich, Vladik
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (01) : 2513 - 2514
  • [18] A Novel Data-Driven Attack Method on Machine Learning Models
    Sadikoglu, Emre
    Kosesoy, Irfan
    Gok, Murat
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (03) : 402 - 417
  • [19] Machine Learning Methods for Development of Data-Driven Turbulence Models
    Yakovenko, Sergey N.
    Razizadeh, Omid
    HIGH-ENERGY PROCESSES IN CONDENSED MATTER (HEPCM 2020), 2020, 2288
  • [20] Data-Driven Blood Glucose Pattern Classification and Anomalies Detection: Machine-Learning Applications in Type 1 Diabetes
    Woldaregay, Ashenafi Zebene
    Arsand, Eirik
    Botsis, Taxiarchis
    Albers, David
    Mamykina, Lena
    Hartvigsen, Gunnar
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2019, 21 (05)