Classification of Severe Maternal Morbidity from Electronic Health Records Written in Spanish Using Natural Language Processing

被引:4
|
作者
Torres-Silva, Ever A. [1 ]
Rua, Santiago [2 ]
Giraldo-Forero, Andres F. [1 ]
Durango, Maria C. [3 ]
Florez-Arango, Jose F. [4 ]
Orozco-Duque, Andres [3 ]
机构
[1] Inst Tecnol Metropolitano, Fac Engn, Medellin 050034, Colombia
[2] Univ Nacl Abierta & Distancia, Sch Basic Sci Technol & Engn, Bogota 111321, Colombia
[3] Inst Tecnol Metropolitano, Dept Appl Sci, Medellin 050034, Colombia
[4] Weill Cornell Med, Populat Hlth Sci, New York, NY 10065 USA
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 19期
关键词
electronic health records; machine learning; maternal health; pregnancy complications; natural language processing; word-embedding; MACHINE; EMBEDDINGS;
D O I
10.3390/app131910725
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One stepping stone for reducing the maternal mortality is to identify severe maternal morbidity (SMM) using Electronic Health Records (EHRs). We aim to develop a pipeline to represent and classify the unstructured text of maternal progress notes in eight classes according to the silver labels defined by the ICD-10 codes associated with SMM. We preprocessed the text, removing protected health information (PHI) and reducing stop words. We built different pipelines to classify the SMM by the combination of six word-embeddings schemes, three different approaches for the representation of the documents (average, clustering, and principal component analysis), and five well-known machine learning classifiers. Additionally, we implemented an algorithm for typos and misspelling adjustment based on the Levenshtein distance to the Spanish Billion Word Corpus dictionary. We analyzed 43,529 documents constructed by an average of 4.15 progress notes from 22,937 patients. The pipeline with the best performance was the one that included Word2Vec, typos and spelling adjustment, document representation by PCA, and an SVM classifier. We found that it is possible to identify conditions such as miscarriage complication or hypertensive disorders from clinical notes written in Spanish, with a true positive rate higher than 0.85. This is the first approach to classify SMM from the unstructured text contained in the maternal EHRs, which can contribute to the solution of one of the most important public health problems in the world. Future works must test other representation and classification approaches to detect the risk of SMM.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing
    Han, Sifei
    Zhang, Robert F.
    Shi, Lingyun
    Richie, Russell
    Liu, Haixia
    Tseng, Andrew
    Quan, Wei
    Ryan, Neal
    Brent, David
    Tsui, Fuchiang R.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 127
  • [42] Natural Language Processing and Electronic Medical Records Reply
    Murff, Harvey J.
    FitzHenry, Fern
    Speroff, Theodore
    JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2011, 306 (21): : 2325 - 2326
  • [43] Identifying Elective Induction of Labor from Electronic Health Records: Natural Language Processing Approach.
    Xie, Fagen
    Im, Theresa M.
    Park, Daniella
    Malek, Mary W.
    Fassett, Michael J.
    Getahun, Darios
    REPRODUCTIVE SCIENCES, 2024, 31 : 202A - 203A
  • [44] A NATURAL LANGUAGE PROCESSING SYSTEM TO EXTRACT COVID-19 SYMPTOMS FROM ELECTRONIC HEALTH RECORDS
    Chen, Jinying
    Fukunaga, Mayuko Ito
    Jones, Evan
    Balakrishnan, Kavitha
    Cutrona, Sarah
    JOURNAL OF GENERAL INTERNAL MEDICINE, 2021, 36 (SUPPL 1) : S2 - S2
  • [45] Adverse Drug Reaction extraction on Electronic Health Records written in Spanish
    Santiso Gonzalez, Sara
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (64): : 119 - 122
  • [46] Detecting inpatient falls by using natural language processing of electronic medical records
    Shin-ichi Toyabe
    BMC Health Services Research, 12
  • [47] Detecting inpatient falls by using natural language processing of electronic medical records
    Toyabe, Shin-ichi
    BMC HEALTH SERVICES RESEARCH, 2012, 12
  • [48] Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing
    Sada, Yvonne
    Hou, Jason
    Richardson, Peter
    El-Serag, Hashem
    Davila, Jessica
    MEDICAL CARE, 2016, 54 (02) : E9 - E14
  • [49] Identification and Characterization of Immune Checkpoint Inhibitor-Induced Toxicities From Electronic Health Records Using Natural Language Processing
    Barman, Hannah
    Venkateswaran, Sriram
    Del Santo, Antonio
    Yoo, Unice
    Silvert, Eli
    Rao, Krishna
    Raghunathan, Bharathwaj
    Kottschade, Lisa A.
    Block, Matthew S.
    Chandler, G. Scott
    Zalis, Joshua
    Wagner, Tyler E.
    Mohindra, Rajat
    JCO CLINICAL CANCER INFORMATICS, 2024, 8
  • [50] Natural Language Processing of Clinical Notes in Electronic Health Records to Improve Capture of Hypoglycemia
    Nunes, Anthony P.
    Yu, Shengsheng
    Kurtyka, Karen
    Senerchia, Cynthia
    Hill, Jefffrey
    Brodovicz, Kimberly G.
    Radican, Larry
    Engel, Samuel S.
    Calvo, Sean R.
    Dore, David D.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2014, 23 : 494 - 494