Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing

被引:2
|
作者
Sung, Sheng-Feng [1 ,2 ]
Sung, Kuan-Lin [3 ]
Pan, Ru-Chiou [4 ]
Lee, Pei-Ju [5 ,6 ]
Hu, Ya-Han [7 ]
机构
[1] Ditmanson Med Fdn, Dept Internal Med, Div Neurol, Chiayi Christian Hosp, Chiayi, Taiwan
[2] Min Hwei Jr Coll Hlth Care Management, Dept Nursing, Tainan, Taiwan
[3] Natl Taiwan Univ, Sch Med, Taipei, Taiwan
[4] Ditmanson Med Fdn, Clin Data Ctr, Chiayi Christian Hosp, Dept Med Res, Chiayi, Taiwan
[5] Natl Chung Cheng Univ, Dept Informat Management, Minxiong Township, Chiayi County, Taiwan
[6] Natl Chung Cheng Univ, Inst Healthcare Informat Management, Minxiong Township, Chiayi County, Taiwan
[7] Natl Cent Univ, Dept Informat Management, Taoyuan, Taiwan
来源
关键词
atrial fibrillation; electronic health records; ischemic stroke; natural language processing; prediction; TRANSIENT ISCHEMIC ATTACK; TEXT CLASSIFICATION; FEATURE-SELECTION; VASCULAR EVENTS; STROKE CARE; SCORE; VALIDATION; RECURRENCE; PREDICTION; TAIWAN;
D O I
10.3389/fcvm.2022.941237
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
BackgroundTimely detection of atrial fibrillation (AF) after stroke is highly clinically relevant, aiding decisions on the optimal strategies for secondary prevention of stroke. In the context of limited medical resources, it is crucial to set the right priorities of extended heart rhythm monitoring by stratifying patients into different risk groups likely to have newly detected AF (NDAF). This study aimed to develop an electronic health record (EHR)-based machine learning model to assess the risk of NDAF in an early stage after stroke. MethodsLinked data between a hospital stroke registry and a deidentified research-based database including EHRs and administrative claims data was used. Demographic features, physiological measurements, routine laboratory results, and clinical free text were extracted from EHRs. The extreme gradient boosting algorithm was used to build the prediction model. The prediction performance was evaluated by the C-index and was compared to that of the AS5F and CHASE-LESS scores. ResultsThe study population consisted of a training set of 4,064 and a temporal test set of 1,492 patients. During a median follow-up of 10.2 months, the incidence rate of NDAF was 87.0 per 1,000 person-year in the test set. On the test set, the model based on both structured and unstructured data achieved a C-index of 0.840, which was significantly higher than those of the AS5F (0.779, p = 0.023) and CHASE-LESS (0.768, p = 0.005) scores. ConclusionsIt is feasible to build a machine learning model to assess the risk of NDAF based on EHR data available at the time of hospital admission. Inclusion of information derived from clinical free text can significantly improve the model performance and may outperform risk scores developed using traditional statistical methods. Further studies are needed to assess the clinical usefulness of the prediction model.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Leveraging natural language processing to augment structured social determinants of health data in the electronic health record
    Lybarger, Kevin
    Dobbins, Nicholas J.
    Long, Ritche
    Singh, Angad
    Wedgeworth, Patrick
    Uzuner, Ozlem
    Yetisgen, Meliha
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2023, 30 (08) : 1389 - 1397
  • [32] Early recognition of multiple sclerosis using natural language processing of the electronic health record
    Chase, Herbert S.
    Mitrani, Lindsey R.
    Lu, Gabriel G.
    Fulgieri, Dominick J.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2017, 17 : 24
  • [33] ADJUDICATION OF HEART FAILURE HOSPITALIZATION USING NATURAL LANGUAGE PROCESSING IN THE ELECTRONIC HEALTH RECORD
    Cunningham, Jonathan
    Singh, Pulkit
    Lau, Emily Shou Wai
    Khurshid, Shaan
    Haimovich, Julian
    Turner, Ashby
    Wang, Xin
    Solomon, Scott D.
    Ellinor, Patrick
    Lubitz, Steven
    Batra, Puneet
    Ho, Jennifer
    JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2023, 81 (08) : 4022 - 4022
  • [34] Early recognition of multiple sclerosis using natural language processing of the electronic health record
    Herbert S. Chase
    Lindsey R. Mitrani
    Gabriel G. Lu
    Dominick J. Fulgieri
    BMC Medical Informatics and Decision Making, 17
  • [35] Prediction of Recurrent Atherosclerotic Cardiovascular Disease Risk Using Machine Learning and Electronic Health Record Data
    Sarraju, Ashish
    Ward, Andrew
    Chung, Sukyung
    Li, Jiang
    Scheinker, David
    Rodriguez, Fatima
    CIRCULATION, 2020, 142
  • [36] Novel methodology for the evaluation of symptoms reported by patients with newly diagnosed atrial fibrillation: Application of natural language processing to electronic medical records data
    Reynolds, Matthew R.
    Bunch, Thomas Jared
    Steinberg, Benjamin A.
    Ronk, Christopher J.
    Kim, Hankyul
    Wieloch, Mattias
    Lip, Gregory Y. H.
    JOURNAL OF CARDIOVASCULAR ELECTROPHYSIOLOGY, 2023, 34 (04) : 790 - 799
  • [37] Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment
    Feller, Daniel J.
    Zucker, Jason
    Yin, Michael T.
    Gordon, Peter
    Elhadad, Noemie
    JAIDS-JOURNAL OF ACQUIRED IMMUNE DEFICIENCY SYNDROMES, 2018, 77 (02) : 160 - 166
  • [38] Automated Extraction of Stroke Severity From Unstructured Electronic Health Records Using Natural Language Processing
    Fernandes, Marta
    Westover, M. Brandon
    Singhal, Aneesh B.
    Zafar, Sahar F.
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2024, 13 (21):
  • [39] AUTOMATED, ACCURATE IDENTIFICATION OF VENTRICULAR TACHYCARDIA FROM ELECTRONIC HEALTH RECORDS USING NATURAL LANGUAGE PROCESSING
    Brennan, Kelly
    Azizi, Zahra
    Feng, Ruibin
    Goyal, Jatin
    Liu, Xichong
    Ganesan, Prasanth
    Ruiperez-Campillo, Samuel
    Baykaner, Tina
    Badhwar, Nitish
    John, Roy M.
    Viswanathan, Mohan
    Perino, Alexander
    Wang, Paul J.
    Rogers, Albert J.
    Narayan, Sanjiv M.
    JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2024, 83 (13) : 2644 - 2644
  • [40] Automated Detection of Malevolent Domains in Cyberspace Using Natural Language Processing and Machine Learning
    Samad, Saleem Raja Abdul
    Ganesan, Pradeepa
    Al-Kaabi, Amna Salim
    Rajasekaran, Justin
    Singaravelan, M.
    Basha, Peerbasha Shebbeer
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (10) : 328 - 341