Classification of Severe Maternal Morbidity from Electronic Health Records Written in Spanish Using Natural Language Processing

被引:4
|
作者
Torres-Silva, Ever A. [1 ]
Rua, Santiago [2 ]
Giraldo-Forero, Andres F. [1 ]
Durango, Maria C. [3 ]
Florez-Arango, Jose F. [4 ]
Orozco-Duque, Andres [3 ]
机构
[1] Inst Tecnol Metropolitano, Fac Engn, Medellin 050034, Colombia
[2] Univ Nacl Abierta & Distancia, Sch Basic Sci Technol & Engn, Bogota 111321, Colombia
[3] Inst Tecnol Metropolitano, Dept Appl Sci, Medellin 050034, Colombia
[4] Weill Cornell Med, Populat Hlth Sci, New York, NY 10065 USA
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 19期
关键词
electronic health records; machine learning; maternal health; pregnancy complications; natural language processing; word-embedding; MACHINE; EMBEDDINGS;
D O I
10.3390/app131910725
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One stepping stone for reducing the maternal mortality is to identify severe maternal morbidity (SMM) using Electronic Health Records (EHRs). We aim to develop a pipeline to represent and classify the unstructured text of maternal progress notes in eight classes according to the silver labels defined by the ICD-10 codes associated with SMM. We preprocessed the text, removing protected health information (PHI) and reducing stop words. We built different pipelines to classify the SMM by the combination of six word-embeddings schemes, three different approaches for the representation of the documents (average, clustering, and principal component analysis), and five well-known machine learning classifiers. Additionally, we implemented an algorithm for typos and misspelling adjustment based on the Levenshtein distance to the Spanish Billion Word Corpus dictionary. We analyzed 43,529 documents constructed by an average of 4.15 progress notes from 22,937 patients. The pipeline with the best performance was the one that included Word2Vec, typos and spelling adjustment, document representation by PCA, and an SVM classifier. We found that it is possible to identify conditions such as miscarriage complication or hypertensive disorders from clinical notes written in Spanish, with a true positive rate higher than 0.85. This is the first approach to classify SMM from the unstructured text contained in the maternal EHRs, which can contribute to the solution of one of the most important public health problems in the world. Future works must test other representation and classification approaches to detect the risk of SMM.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Natural Language Processing to Identify Lupus Nephritis Phenotype in Electronic Health Records
    Deng, Yu
    Pacheco, Jennifer
    Chung, Anh
    Mao, Chengsheng
    Smith, Joshua
    Zhao, Juan
    Wei, Wei-Qi
    Barnado, April
    Weng, Chunhua
    Liu, Cong
    Gordon, Adam
    Yu, Jingzhi
    Tedla, Yacob
    Kho, Abel
    Ramsey-Goldman, Rosalind
    Walunas, Theresa
    Luo, Yuan
    ARTHRITIS & RHEUMATOLOGY, 2021, 73 : 666 - 667
  • [32] Natural Language Processing Identifies Goals of Care Documentation in Electronic Health Records
    Joehl, Hillarie E.
    Friend, Patricia
    JOURNAL OF PAIN AND SYMPTOM MANAGEMENT, 2024, 67 (05) : E720 - E721
  • [33] Neural Natural Language Processing for unstructured data in electronic health records: A review
    Li, Irene
    Pan, Jessica
    Goldwasser, Jeremy
    Verma, Neha
    Wong, Wai Pan
    Nuzumlali, Muhammed Yavuz
    Rosand, Benjamin
    Li, Yixin
    Zhang, Matthew
    Chang, David
    Taylor, R. Andrew
    Krumholz, Harlan M.
    Radev, Dragomir
    COMPUTER SCIENCE REVIEW, 2022, 46
  • [34] Natural language generation for electronic health records
    Lee, Scott H.
    NPJ DIGITAL MEDICINE, 2018, 1
  • [35] Natural language generation for electronic health records
    Scott H. Lee
    npj Digital Medicine, 1
  • [36] Natural language processing of admission notes to predict severe maternal morbidity during the delivery encounter
    Clapp, Mark A.
    Kim, Ellen
    James, Kaitlyn E.
    Perlis, Roy H.
    Kaimal, Anjali J.
    McCoy, Thomas H., Jr.
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2022, 227 (03)
  • [37] Deep Learning Approaches for Predicting Glaucoma Progression Using Electronic Health Records and Natural Language Processing
    Wang, Sophia Y.
    Tseng, Benjamin
    Hernandez-Boussard, Tina
    OPHTHALMOLOGY SCIENCE, 2022, 2 (02):
  • [38] Active Computerized Pharmacovigilance Using Natural Language Processing, Statistics, and Electronic Health Records: A Feasibility Study
    Wang, Xiaoyan
    Hripcsak, George
    Markatou, Marianthi
    Friedman, Carol
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2009, 16 (03) : 328 - 337
  • [39] Identifying ANCA-Associated Vasculitis Cases in Electronic Health Records Using Natural Language Processing
    Wallace, Zachary
    Stone, John H.
    Choi, Hyon K.
    ARTHRITIS & RHEUMATOLOGY, 2018, 70
  • [40] Identifying Suicidal Adolescents from Mental Health Records Using Natural Language Processing
    Velupillai, Sumithra
    Epstein, Sophie
    Bittar, Andre
    Stephenson, Thomas
    Dutta, Rina
    Downs, Johnny
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 413 - 417