Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing

被引:2
|
作者
Sung, Sheng-Feng [1 ,2 ]
Sung, Kuan-Lin [3 ]
Pan, Ru-Chiou [4 ]
Lee, Pei-Ju [5 ,6 ]
Hu, Ya-Han [7 ]
机构
[1] Ditmanson Med Fdn, Dept Internal Med, Div Neurol, Chiayi Christian Hosp, Chiayi, Taiwan
[2] Min Hwei Jr Coll Hlth Care Management, Dept Nursing, Tainan, Taiwan
[3] Natl Taiwan Univ, Sch Med, Taipei, Taiwan
[4] Ditmanson Med Fdn, Clin Data Ctr, Chiayi Christian Hosp, Dept Med Res, Chiayi, Taiwan
[5] Natl Chung Cheng Univ, Dept Informat Management, Minxiong Township, Chiayi County, Taiwan
[6] Natl Chung Cheng Univ, Inst Healthcare Informat Management, Minxiong Township, Chiayi County, Taiwan
[7] Natl Cent Univ, Dept Informat Management, Taoyuan, Taiwan
来源
关键词
atrial fibrillation; electronic health records; ischemic stroke; natural language processing; prediction; TRANSIENT ISCHEMIC ATTACK; TEXT CLASSIFICATION; FEATURE-SELECTION; VASCULAR EVENTS; STROKE CARE; SCORE; VALIDATION; RECURRENCE; PREDICTION; TAIWAN;
D O I
10.3389/fcvm.2022.941237
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
BackgroundTimely detection of atrial fibrillation (AF) after stroke is highly clinically relevant, aiding decisions on the optimal strategies for secondary prevention of stroke. In the context of limited medical resources, it is crucial to set the right priorities of extended heart rhythm monitoring by stratifying patients into different risk groups likely to have newly detected AF (NDAF). This study aimed to develop an electronic health record (EHR)-based machine learning model to assess the risk of NDAF in an early stage after stroke. MethodsLinked data between a hospital stroke registry and a deidentified research-based database including EHRs and administrative claims data was used. Demographic features, physiological measurements, routine laboratory results, and clinical free text were extracted from EHRs. The extreme gradient boosting algorithm was used to build the prediction model. The prediction performance was evaluated by the C-index and was compared to that of the AS5F and CHASE-LESS scores. ResultsThe study population consisted of a training set of 4,064 and a temporal test set of 1,492 patients. During a median follow-up of 10.2 months, the incidence rate of NDAF was 87.0 per 1,000 person-year in the test set. On the test set, the model based on both structured and unstructured data achieved a C-index of 0.840, which was significantly higher than those of the AS5F (0.779, p = 0.023) and CHASE-LESS (0.768, p = 0.005) scores. ConclusionsIt is feasible to build a machine learning model to assess the risk of NDAF based on EHR data available at the time of hospital admission. Inclusion of information derived from clinical free text can significantly improve the model performance and may outperform risk scores developed using traditional statistical methods. Further studies are needed to assess the clinical usefulness of the prediction model.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation
    Afshar, Majid
    Phillips, Andrew
    Karnik, Niranjan
    Mueller, Jeanne
    To, Daniel
    Gonzalez, Richard
    Price, Ron
    Cooper, Richard
    Joyce, Cara
    Dligach, Dmitriy
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (03) : 254 - 261
  • [22] Prediction of Atherosclerotic Cardiovascular Disease Risk Using Machine Learning and Electronic Health Record Data
    Ward, Andrew
    Sarraju, Ashish
    Chung, Sukyung
    Palaniappan, Latha
    Scheinker, David
    Rodriguez, Fatima
    CIRCULATION, 2019, 140
  • [23] Interface terminology: Natural language processing of clinical data in Electronic Health Record narratives
    de Souza, Amanda Damasceno
    Correa, Fabio
    de Araujo Nery Ribeiro, Jurema Suely
    de Carvalho Dutra, Frederico Giffoni
    da Silva, Helton Junio
    Felipe, Eduardo Ribeiro
    ENCONTROS BIBLI-REVISTA ELETRONICA DE BIBLIOTECONOMIA E CIENCIA DA INFORMACAO, 2024, 29
  • [24] Automated Genre Classification of Books Using Machine Learning and Natural Language Processing
    Gupta, Shikha
    Agarwal, Mohit
    Jain, Satbir
    2019 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2019), 2019, : 269 - 272
  • [25] Comprehensive assessment of the true sepsis burden using electronic health record screening augmented by natural language processing
    R Arnold
    J Isserman
    S Smola
    E Jackson
    Critical Care, 18 (Suppl 1):
  • [26] Using Natural Language Processing to Predict Risk in Electronic Health Records
    Duy Van Le
    Montgomery, James
    Kirkby, Kenneth
    Scanlan, Joel
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 574 - 578
  • [27] Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing
    Murff, Harvey J.
    FitzHenry, Fern
    Matheny, Michael E.
    Gentry, Nancy
    Kotter, Kristen L.
    Crimin, Kimberly
    Dittus, Robert S.
    Rosen, Amy K.
    Elkin, Peter L.
    Brown, Steven H.
    Speroff, Theodore
    JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2011, 306 (08): : 848 - 855
  • [28] Machine Learning Models for Pancreatic Cancer Risk Prediction Using Electronic Health Record Data-A Systematic Review and Assessment
    Mishra, Anup Kumar
    Chong, Bradford
    Arunachalam, Shivaram P.
    Oberg, Ann L.
    Majumder, Shounak
    AMERICAN JOURNAL OF GASTROENTEROLOGY, 2024, 119 (08): : 1466 - 1482
  • [29] Retrospective study of propionic acidemia using natural language processing in Mayo Clinic electronic health record data
    Barman, Hannah
    Sikirica, Vanja
    Carlson, Katherine
    Silvert, Eli
    Carlson, Katherine Brewer
    Boyer, Suzanne
    Glaser, Ruchira
    Morava, Eva
    Wagner, Tyler
    Lanpher, Brendan
    MOLECULAR GENETICS AND METABOLISM, 2023, 140 (03)
  • [30] Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data
    Marafino, Ben J.
    Park, Miran
    Davies, Jason M.
    Thombley, Robert
    Luft, Harold S.
    Sing, David C.
    Kazi, Dhruv S.
    DeJong, Colette
    Boscardin, W. John
    Dean, Mitzi L.
    Dudley, R. Adams
    JAMA NETWORK OPEN, 2018, 1 (08)