Classification of Severe Maternal Morbidity from Electronic Health Records Written in Spanish Using Natural Language Processing

被引:4
|
作者
Torres-Silva, Ever A. [1 ]
Rua, Santiago [2 ]
Giraldo-Forero, Andres F. [1 ]
Durango, Maria C. [3 ]
Florez-Arango, Jose F. [4 ]
Orozco-Duque, Andres [3 ]
机构
[1] Inst Tecnol Metropolitano, Fac Engn, Medellin 050034, Colombia
[2] Univ Nacl Abierta & Distancia, Sch Basic Sci Technol & Engn, Bogota 111321, Colombia
[3] Inst Tecnol Metropolitano, Dept Appl Sci, Medellin 050034, Colombia
[4] Weill Cornell Med, Populat Hlth Sci, New York, NY 10065 USA
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 19期
关键词
electronic health records; machine learning; maternal health; pregnancy complications; natural language processing; word-embedding; MACHINE; EMBEDDINGS;
D O I
10.3390/app131910725
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One stepping stone for reducing the maternal mortality is to identify severe maternal morbidity (SMM) using Electronic Health Records (EHRs). We aim to develop a pipeline to represent and classify the unstructured text of maternal progress notes in eight classes according to the silver labels defined by the ICD-10 codes associated with SMM. We preprocessed the text, removing protected health information (PHI) and reducing stop words. We built different pipelines to classify the SMM by the combination of six word-embeddings schemes, three different approaches for the representation of the documents (average, clustering, and principal component analysis), and five well-known machine learning classifiers. Additionally, we implemented an algorithm for typos and misspelling adjustment based on the Levenshtein distance to the Spanish Billion Word Corpus dictionary. We analyzed 43,529 documents constructed by an average of 4.15 progress notes from 22,937 patients. The pipeline with the best performance was the one that included Word2Vec, typos and spelling adjustment, document representation by PCA, and an SVM classifier. We found that it is possible to identify conditions such as miscarriage complication or hypertensive disorders from clinical notes written in Spanish, with a true positive rate higher than 0.85. This is the first approach to classify SMM from the unstructured text contained in the maternal EHRs, which can contribute to the solution of one of the most important public health problems in the world. Future works must test other representation and classification approaches to detect the risk of SMM.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Learning to Identify Severe Maternal Morbidity from Electronic Health Records
    Gao, Cheng
    Osmundson, Sarah
    Yan, Xiaowei
    Edwards, Digna Velez
    Malin, Bradley A.
    Chen, You
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 143 - 147
  • [2] Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records
    Zhao, Sizheng Steven
    Hong, Chuan
    Cai, Tianrun
    Xu, Chang
    Huang, Jie
    Ermann, Joerg
    Goodson, Nicola J.
    Solomon, Daniel H.
    Cai, Tianxi
    Liao, Katherine P.
    RHEUMATOLOGY, 2020, 59 (05) : 1059 - 1065
  • [3] Ascertainment of Delirium Status Using Natural Language Processing From Electronic Health Records
    Fu, Sunyang
    Lopes, Guilherme S.
    Pagali, Sandeep R.
    Thorsteinsdottir, Bjoerg
    LeBrasseur, Nathan K.
    Wen, Andrew
    Liu, Hongfang
    Rocca, Walter A.
    Olson, Janet E.
    St Sauver, Jennifer
    Sohn, Sunghwan
    JOURNALS OF GERONTOLOGY SERIES A-BIOLOGICAL SCIENCES AND MEDICAL SCIENCES, 2022, 77 (03): : 524 - 530
  • [4] Using Natural Language Processing to Predict Risk in Electronic Health Records
    Duy Van Le
    Montgomery, James
    Kirkby, Kenneth
    Scanlan, Joel
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 574 - 578
  • [5] Natural language processing of admission notes predicts severe maternal morbidity
    Clapp, Mark A.
    Mccoy, Thomas H.
    Kim, Ellen
    James, Kaitlyn E.
    Perlis, Roy H.
    Kaimal, Anjali J.
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2022, 226 (01) : S320 - S320
  • [6] Validation of Phenotyping Algorithms for Stroke From Electronic Health Records Using Natural Language Processing
    Zhao, Yiqing
    Fu, Suyang
    Larson, Nicholas B.
    Decker, Paul A.
    Chamberlain, Alanna M.
    Roger, Veronique L.
    Liu, Hongfang
    Bielinski, Suzette J.
    CIRCULATION, 2019, 139
  • [7] Leveraging Electronic Health Records to Learn Progression Path for Severe Maternal Morbidity
    Gao, Cheng
    Osmundson, Sarah
    Yan, Xiaowei
    Edwards, Digna Velez
    Malin, Bradley A.
    Chen, You
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 148 - 152
  • [8] OPTIMIZATION OF NATURAL LANGUAGE PROCESSING-SUPPORTED COMORBIDITY CLASSIFICATION ALGORITHMS IN ELECTRONIC HEALTH RECORDS
    Hooley, I
    Chen, R.
    Long, L.
    Cohen, A.
    Adamson, B.
    VALUE IN HEALTH, 2019, 22 : S87 - S87
  • [9] Extracting social determinants of health from electronic health records using natural language processing: a systematic review
    Patra, Braja G.
    Sharma, Mohit M.
    Vekaria, Veer
    Adekkanattu, Prakash
    Patterson, Olga, V
    Glicksberg, Benjamin
    Lepow, Lauren A.
    Ryu, Euijung
    Biernacka, Joanna M.
    Furmanchuk, Al'ona
    George, Thomas J.
    Hogan, William
    Wu, Yonghui
    Yang, Xi
    Bian, Jiang
    Weissman, Myrna
    Wickramaratne, Priya
    Mann, J. John
    Olfson, Mark
    Campion, Thomas R., Jr.
    Weiner, Mark
    Pathak, Jyotishman
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (12) : 2716 - 2727
  • [10] Prediction of severe chest injury using natural language processing from the electronic health record
    Kulshrestha, Sujay
    Dligach, Dmitriy
    Joyce, Cara
    Baker, Marshall S.
    Gonzalez, Richard
    O'Rourke, Ann P.
    Glazer, Joshua M.
    Stey, Anne
    Kruser, Jacqueline M.
    Churpek, Matthew M.
    Afshar, Majid
    INJURY-INTERNATIONAL JOURNAL OF THE CARE OF THE INJURED, 2021, 52 (02): : 205 - 212