Classification of Severe Maternal Morbidity from Electronic Health Records Written in Spanish Using Natural Language Processing

被引:4
|
作者
Torres-Silva, Ever A. [1 ]
Rua, Santiago [2 ]
Giraldo-Forero, Andres F. [1 ]
Durango, Maria C. [3 ]
Florez-Arango, Jose F. [4 ]
Orozco-Duque, Andres [3 ]
机构
[1] Inst Tecnol Metropolitano, Fac Engn, Medellin 050034, Colombia
[2] Univ Nacl Abierta & Distancia, Sch Basic Sci Technol & Engn, Bogota 111321, Colombia
[3] Inst Tecnol Metropolitano, Dept Appl Sci, Medellin 050034, Colombia
[4] Weill Cornell Med, Populat Hlth Sci, New York, NY 10065 USA
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 19期
关键词
electronic health records; machine learning; maternal health; pregnancy complications; natural language processing; word-embedding; MACHINE; EMBEDDINGS;
D O I
10.3390/app131910725
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One stepping stone for reducing the maternal mortality is to identify severe maternal morbidity (SMM) using Electronic Health Records (EHRs). We aim to develop a pipeline to represent and classify the unstructured text of maternal progress notes in eight classes according to the silver labels defined by the ICD-10 codes associated with SMM. We preprocessed the text, removing protected health information (PHI) and reducing stop words. We built different pipelines to classify the SMM by the combination of six word-embeddings schemes, three different approaches for the representation of the documents (average, clustering, and principal component analysis), and five well-known machine learning classifiers. Additionally, we implemented an algorithm for typos and misspelling adjustment based on the Levenshtein distance to the Spanish Billion Word Corpus dictionary. We analyzed 43,529 documents constructed by an average of 4.15 progress notes from 22,937 patients. The pipeline with the best performance was the one that included Word2Vec, typos and spelling adjustment, document representation by PCA, and an SVM classifier. We found that it is possible to identify conditions such as miscarriage complication or hypertensive disorders from clinical notes written in Spanish, with a true positive rate higher than 0.85. This is the first approach to classify SMM from the unstructured text contained in the maternal EHRs, which can contribute to the solution of one of the most important public health problems in the world. Future works must test other representation and classification approaches to detect the risk of SMM.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Identification of recurrent atrial fibrillation using natural language processing applied to electronic health records
    Zheng, Chengyi
    Lee, Ming-sum
    Bansal, Nisha
    Go, Alan S.
    Chen, Cheng
    Harrison, Teresa N.
    Fan, Dongjie
    Allen, Amanda
    Garcia, Elisha
    Lidgard, Ben
    Singer, Daniel
    An, Jaejin
    EUROPEAN HEART JOURNAL-QUALITY OF CARE AND CLINICAL OUTCOMES, 2024, 10 (01) : 77 - 88
  • [22] Using Natural Language Processing on Electronic Health Records to Enhance Detection and Prediction of Psychosis Risk
    Irving, Jessica
    Patel, Rashmi
    Oliver, Dominic
    Colling, Craig
    Pritchard, Megan
    Broadbent, Matthew
    Baldwin, Helen
    Stahl, Daniel
    Stewart, Robert
    Fusar-Poli, Paolo
    SCHIZOPHRENIA BULLETIN, 2021, 47 (02) : 405 - 414
  • [23] Natural Language Processing to Improve Prediction of Incident Atrial Fibrillation Using Electronic Health Records
    Ashburner, Jeffrey M.
    Chang, Yuchiao
    Wang, Xin
    Khurshid, Shaan
    Anderson, Christopher D.
    Dahal, Kumar
    Weisenfeld, Dana
    Cai, Tianrun
    Liao, Katherine P.
    Wagholikar, Kavishwar B.
    Murphy, Shawn N.
    Atlas, Steven J.
    Lubitz, Steven A.
    Singer, Daniel E.
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2022, 11 (15):
  • [24] Ascertainment of asthma prognosis using natural language processing from electronic medical records
    Sohn, Sunghwan
    Wi, Chung-Il
    Wu, Stephen T.
    Liu, Hongfang
    Ryu, Euijung
    Krusemark, Elizabeth
    Seabright, Alicia
    Voge, Gretchen A.
    Juhn, Young J.
    JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY, 2018, 141 (06) : 2292 - 2294
  • [25] ARTERIAL: A Natural Language Processing Model for Prevention of Information Leakage from Electronic Health Records
    Goldschmidt, Guilherme
    Zeiser, Felipe Andre
    Righi, Rodrigo da Rosa
    da Costa, Cristiano Andre
    2023 XIII BRAZILIAN SYMPOSIUM ON COMPUTING SYSTEMS ENGINEERING, SBESC, 2023,
  • [26] Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis
    Rybinski, Maciej
    Dai, Xiang
    Singh, Sonit
    Karimi, Sarvnaz
    Nguyen, Anthony
    JMIR MEDICAL INFORMATICS, 2021, 9 (04)
  • [27] Applying Natural Language Processing Toolkits to Electronic Health Records - An Experience Report
    Barrett, Neil
    Weber-Jahnke, Jens H.
    ADVANCES IN INFORMATION TECHNOLOGY AND COMMUNICATION IN HEALTH, 2009, 143 : 441 - 446
  • [28] Natural language processing to identify lupus nephritis phenotype in electronic health records
    Deng, Yu
    Pacheco, Jennifer A.
    Ghosh, Anika
    Chung, Anh
    Mao, Chengsheng
    Smith, Joshua C.
    Zhao, Juan
    Wei, Wei-Qi
    Barnado, April
    Dorn, Chad
    Weng, Chunhua
    Liu, Cong
    Cordon, Adam
    Yu, Jingzhi
    Tedla, Yacob
    Kho, Abel
    Ramsey-Goldman, Rosalind
    Walunas, Theresa
    Luo, Yuan
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 22 (SUPPL 2)
  • [30] Natural language processing to identify lupus nephritis phenotype in electronic health records
    Yu Deng
    Jennifer A. Pacheco
    Anika Ghosh
    Anh Chung
    Chengsheng Mao
    Joshua C. Smith
    Juan Zhao
    Wei-Qi Wei
    April Barnado
    Chad Dorn
    Chunhua Weng
    Cong Liu
    Adam Cordon
    Jingzhi Yu
    Yacob Tedla
    Abel Kho
    Rosalind Ramsey-Goldman
    Theresa Walunas
    Yuan Luo
    BMC Medical Informatics and Decision Making, 22