A Data-Driven Comparative Analysis of Machine-Learning Models for Familial Hypercholesterolemia Detection

被引:0
|
作者
Kocejko, Tomasz [1 ]
机构
[1] Gdansk Univ Technol, Fac Elect Telecommun & Informat, Dept Biomed Engn, PL-80233 Gdansk, Poland
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期
关键词
machine learning; familial hypercholesterolemia; DLCN; model ensembles; DIAGNOSIS; POPULATION;
D O I
10.3390/app142311187
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application The presented study can contribute to increasing the familial hypercholesterolemia classification and may help reduce the number of undiagnosed cases of the disease. Abstract This study presents an assessment of familial hypercholesterolemia (FH) probability using different algorithms (CatBoost, XGBoost, Random Forest, SVM) and its ensembles, leveraging electronic health record data. The primary objective is to explore an enhanced method for estimating FH probability, surpassing the currently recommended Dutch Lipid Clinic Network (DLCN) Score. The models were trained using the largest Polish cohort of patients enrolled in an FH clinic, all of whom underwent genetic testing for FH-associated mutations. The initial dataset comprised over 100 parameters per patient, which was reduced to 48 clinically accessible features to ensure applicability in routine outpatient settings. To preserve balance, the data were stratified according to DLCN score ranges (<0-2>, <3-5>, <6-8>, and >= 9), representing varying levels of FH likelihood. The dataset was then split into training and test sets with an 80/20 ratio. Machine-learning models were trained, with hyperparameters optimized via grid search. The accuracy of the DLCN score in predicting FH was first evaluated by examining the proportion of patients with positive DNA tests relative to those with a DLCN score of 6 and above, the threshold for genetic testing. The DLCN score demonstrated an accuracy of approximately 40%. In contrast, the CatBoost model and its ensembles achieved over 80% accuracy. While the DLCN score remains a clinically valuable tool, its diagnostic accuracy is limited. The findings indicate that the ML models offer a substantial improvement in the precision of FH diagnosis, demonstrating its potential to enhance clinical decision making in identifying patients with FH.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Robust data-driven machine-learning models for subsurface applications are we there yet?
    Mishra, Srikanta
    Schuetter, Jared
    Datta-Gupta, Akhil
    Bromhal, Grant
    JPT, Journal of Petroleum Technology, 2021, 73 (03): : 25 - 30
  • [2] ANALYSIS OF PIEZOELECTRIC SEMICONDUCTORS VIA DATA-DRIVEN MACHINE-LEARNING TECHNIQUES
    Guo, Yu-ting
    Li, De-zhi
    Zhang, Chun-li
    PROCEEDINGS OF THE 2020 15TH SYMPOSIUM ON PIEZOELECTRCITY, ACOUSTIC WAVES AND DEVICE APPLICATIONS (SPAWDA), 2021, : 258 - 262
  • [3] A Machine-Learning Algorithm with Disjunctive Model for Data-Driven Program Analysis
    Jeon, Minseok
    Jeong, Sehun
    Cha, Sungdeok
    Oh, Hakjoo
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2019, 41 (02):
  • [4] Damage Detection with Data-Driven Machine Learning Models on an Experimental Structure
    Alemu, Yohannes L.
    Lahmer, Tom
    Walther, Christian
    ENG, 2024, 5 (02): : 629 - 656
  • [5] Data-driven models for significant wave height forecasting: Comparative analysis of machine learning techniques
    Durap, Ahmet
    Results in Engineering, 2024, 24
  • [6] Data-Driven Machine-Learning Methods for Diabetes Risk Prediction
    Dritsas, Elias
    Trigka, Maria
    SENSORS, 2022, 22 (14)
  • [7] A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia
    Gu, Jing
    Epland, Matthew
    Ma, Xinshuo
    Park, Jina
    Sanchez, Robert J.
    Li, Ying
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [8] STRATEGIC DETECTION OF FAMILIAL HYPERCHOLESTEROLEMIA: A DATA-DRIVEN APPROACH FROM CENTRAL LABORATORIES
    Ibrahim, Shirin
    Nurmohamed, Nick
    Nierman, Melchior
    De Goeij, Jim
    Zuurbier, Linda
    Van Rooij, Jeroen
    De Vries, Jard
    Hovingh, G.
    Reeskamp, Laurens
    Stroes, Erik
    ATHEROSCLEROSIS, 2024, 395
  • [9] Personalized Tourist Recommender System: A Data-Driven and Machine-Learning Approach
    Shrestha, Deepanjal
    Tan, Wenan
    Shrestha, Deepmala
    Rajkarnikar, Neesha
    Jeong, Seung-Ryul
    COMPUTATION, 2024, 12 (03)
  • [10] Data-driven models in machine learning for crime prediction
    Wawrzyniak, Zbigniew M.
    Jankowski, Stanislaw
    Szczechla, Eliza
    Szymanski, Zbigniew
    Pytlak, Radoslaw
    Michalak, Pawel
    Borowik, Grzegorz
    2018 26TH INTERNATIONAL CONFERENCE ON SYSTEMS ENGINEERING (ICSENG 2018), 2018,