Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports

被引:51
|
作者
Ong, Charlene Jennifer [1 ,2 ,3 ,4 ]
Orfanoudaki, Agni [4 ]
Zhang, Rebecca [4 ]
Caprasse, Francois Pierre M. [4 ]
Hutch, Meghan [1 ,2 ]
Ma, Liang [1 ]
Fard, Darian [1 ]
Balogun, Oluwafemi [1 ,2 ]
Miller, Matthew, I [1 ]
Minnig, Margaret [1 ]
Saglam, Hanife [3 ]
Prescott, Brenton [2 ]
Greer, David M. [1 ,2 ]
Smirnakis, Stelios [3 ]
Bertsimas, Dimitris [4 ,5 ]
机构
[1] Boston Univ, Sch Med, Boston, MA 02118 USA
[2] Boston Med Ctr, Boston, MA 02118 USA
[3] Harvard Med Sch, Boston, MA 02115 USA
[4] MIT, Operat Res Ctr, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[5] MIT, Sloan Sch Management, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
PLOS ONE | 2020年 / 15卷 / 06期
关键词
ANNOTATION;
D O I
10.1371/journal.pone.0234908
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in identifying stroke in large datasets, triaging critical clinical reports, and quality improvement efforts. In this study, we developed and report a comprehensive framework studying the performance of simple and complex stroke-specific Natural Language Processing (NLP) and Machine Learning (ML) methods to determine presence, location, and acuity of ischemic stroke from radiographic text. We collected 60,564 Computed Tomography and Magnetic Resonance Imaging Radiology reports from 17,864 patients from two large academic medical centers. We used standard techniques to featurize unstructured text and developed neurovascular specific word GloVe embeddings. We trained various binary classification algorithms to identify stroke presence, location, and acuity using 75% of 1,359 expert-labeled reports. We validated our methods internally on the remaining 25% of reports and externally on 500 radiology reports from an entirely separate academic institution. In our internal population, GloVe word embeddings paired with deep learning (Recurrent Neural Networks) had the best discrimination of all methods for our three tasks (AUCs of 0.96, 0.98, 0.93 respectively). Simpler NLP approaches (Bag of Words) performed best with interpretable algorithms (Logistic Regression) for identifying ischemic stroke (AUC of 0.95), MCA location (AUC 0.96), and acuity (AUC of 0.90). Similarly, GloVe and Recurrent Neural Networks (AUC 0.92, 0.89, 0.93) generalized better in our external test set than BOW and Logistic Regression for stroke presence, location and acuity, respectively (AUC 0.89, 0.86, 0.80). Our study demonstrates a comprehensive assessment of NLP techniques for unstructured radiographic text. Our findings are suggestive that NLP/ML methods can be used to discriminate stroke features from large data cohorts for both clinical and research-related investigations.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke
    Kim, Chulho
    Zhu, Vivienne
    Obeid, Jihad
    Lenert, Leslie
    PLOS ONE, 2019, 14 (02):
  • [2] Natural Language Processing of Radiology Reports to Detect Complications of Ischemic Stroke
    Miller, Matthew, I
    Orfanoudaki, Agni
    Cronin, Michael
    Saglam, Hanife
    Kim, Ivy So Yeon
    Balogun, Oluwafemi
    Tzalidi, Maria
    Vasilopoulos, Kyriakos
    Fanaropoulou, Georgia
    Fanaropoulou, Nina M.
    Kalin, Jack
    Hutch, Meghan
    Prescott, Brenton R.
    Brush, Benjamin
    Benjamin, Emelia J.
    Shin, Min
    Mian, Asim
    Greer, David M.
    Smirnakis, Stelios M.
    Ong, Charlene J.
    NEUROCRITICAL CARE, 2022, 37 (SUPPL 2) : 291 - 302
  • [3] Natural Language Processing of Radiology Reports to Detect Complications of Ischemic Stroke
    Matthew I. Miller
    Agni Orfanoudaki
    Michael Cronin
    Hanife Saglam
    Ivy So Yeon Kim
    Oluwafemi Balogun
    Maria Tzalidi
    Kyriakos Vasilopoulos
    Georgia Fanaropoulou
    Nina M. Fanaropoulou
    Jack Kalin
    Meghan Hutch
    Brenton R. Prescott
    Benjamin Brush
    Emelia J. Benjamin
    Min Shin
    Asim Mian
    David M. Greer
    Stelios M. Smirnakis
    Charlene J. Ong
    Neurocritical Care, 2022, 37 : 291 - 302
  • [4] Application of Natural Language Processing and Machine Learning to Radiology Reports
    Jeon, Seoungdeok
    Colburn, Zachary
    Sakai, Joshua
    Hung, Ling-Hong
    Yeung, Ka Yee
    12TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS (ACM-BCB 2021), 2021,
  • [5] Natural Language Processing to identify pneumonia from radiology reports
    Dublin, Sascha
    Baldwin, Eric
    Walker, Rod L.
    Christensen, Lee M.
    Haug, Peter J.
    Jackson, Michael L.
    Nelson, Jennifer C.
    Ferraro, Jeffrey
    Carrell, David
    Chapman, Wendy W.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2013, 22 (08) : 834 - 841
  • [6] Machine learning based natural language processing of radiology reports in orthopaedic trauma
    Olthof, A. W.
    Shouche, P.
    Fennema, E. M.
    IJpma, F. F. A.
    Koolstra, R. H. C.
    Stirler, V. M. A.
    van Ooijen, P. M. A.
    Cornelissen, L. J.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2021, 208
  • [7] Natural language processing to identify ureteric stones in radiology reports
    Li, Andrew Yu
    Elliot, Nikki
    JOURNAL OF MEDICAL IMAGING AND RADIATION ONCOLOGY, 2019, 63 (03) : 307 - 310
  • [8] Automating incidental findings in radiology reports using natural language processing and machine learning to identify and classify pulmonary nodules.
    French, Christi
    Makowski, Maciek
    Terker, Samantha
    Clark, Paul Alexander
    JOURNAL OF CLINICAL ONCOLOGY, 2019, 37 (15)
  • [9] Using Natural Language Processing and Machine Learning to Identify Incident Stroke From Electronic Health Records
    Zhao, Yiqing
    Fu, Sunyang
    Bielinski, Suzette J.
    Decker, Paul
    Chamberlain, Alanna M.
    Roger, Veronique L.
    Liu, Hongfang
    Larson, Nicolas B.
    CIRCULATION, 2020, 141
  • [10] Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports
    Po-Hao Chen
    Hanna Zafar
    Maya Galperin-Aizenberg
    Tessa Cook
    Journal of Digital Imaging, 2018, 31 : 178 - 184