Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports

被引:51
|
作者
Ong, Charlene Jennifer [1 ,2 ,3 ,4 ]
Orfanoudaki, Agni [4 ]
Zhang, Rebecca [4 ]
Caprasse, Francois Pierre M. [4 ]
Hutch, Meghan [1 ,2 ]
Ma, Liang [1 ]
Fard, Darian [1 ]
Balogun, Oluwafemi [1 ,2 ]
Miller, Matthew, I [1 ]
Minnig, Margaret [1 ]
Saglam, Hanife [3 ]
Prescott, Brenton [2 ]
Greer, David M. [1 ,2 ]
Smirnakis, Stelios [3 ]
Bertsimas, Dimitris [4 ,5 ]
机构
[1] Boston Univ, Sch Med, Boston, MA 02118 USA
[2] Boston Med Ctr, Boston, MA 02118 USA
[3] Harvard Med Sch, Boston, MA 02115 USA
[4] MIT, Operat Res Ctr, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[5] MIT, Sloan Sch Management, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
PLOS ONE | 2020年 / 15卷 / 06期
关键词
ANNOTATION;
D O I
10.1371/journal.pone.0234908
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in identifying stroke in large datasets, triaging critical clinical reports, and quality improvement efforts. In this study, we developed and report a comprehensive framework studying the performance of simple and complex stroke-specific Natural Language Processing (NLP) and Machine Learning (ML) methods to determine presence, location, and acuity of ischemic stroke from radiographic text. We collected 60,564 Computed Tomography and Magnetic Resonance Imaging Radiology reports from 17,864 patients from two large academic medical centers. We used standard techniques to featurize unstructured text and developed neurovascular specific word GloVe embeddings. We trained various binary classification algorithms to identify stroke presence, location, and acuity using 75% of 1,359 expert-labeled reports. We validated our methods internally on the remaining 25% of reports and externally on 500 radiology reports from an entirely separate academic institution. In our internal population, GloVe word embeddings paired with deep learning (Recurrent Neural Networks) had the best discrimination of all methods for our three tasks (AUCs of 0.96, 0.98, 0.93 respectively). Simpler NLP approaches (Bag of Words) performed best with interpretable algorithms (Logistic Regression) for identifying ischemic stroke (AUC of 0.95), MCA location (AUC 0.96), and acuity (AUC of 0.90). Similarly, GloVe and Recurrent Neural Networks (AUC 0.92, 0.89, 0.93) generalized better in our external test set than BOW and Logistic Regression for stroke presence, location and acuity, respectively (AUC 0.89, 0.86, 0.80). Our study demonstrates a comprehensive assessment of NLP techniques for unstructured radiographic text. Our findings are suggestive that NLP/ML methods can be used to discriminate stroke features from large data cohorts for both clinical and research-related investigations.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Qualifying Certainty in Radiology Reports through Deep Learning?Based Natural Language Processing
    Liu, F.
    Zhou, P.
    Baccei, S. J.
    Masciocchi, M. J.
    Amornsiripanitch, N.
    Kiefe, C., I
    Rosen, M. P.
    AMERICAN JOURNAL OF NEURORADIOLOGY, 2021, 42 (10) : 1755 - 1761
  • [22] Prediction of Stroke Outcome Using Natural Language Processing-Based Machine Learning of Radiology Report of Brain MRI
    Heo, Tak Sung
    Kim, Yu Seop
    Choi, Jeong Myeong
    Jeong, Yeong Seok
    Seo, Soo Young
    Lee, Jun Ho
    Jeon, Jin Pyeong
    Kim, Chulho
    JOURNAL OF PERSONALIZED MEDICINE, 2020, 10 (04): : 1 - 11
  • [23] EXTRACTING STRUCTURED INFORMATION FROM PATHOLOGY REPORTS USING NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING
    Odisho, Anobel
    Park, Briton
    Altieri, Nicholas
    Murdoch, William
    Carroll, Peter
    Coopberberg, Matthew
    Yu, Bin
    JOURNAL OF UROLOGY, 2019, 201 (04): : E1031 - E1032
  • [24] NATURAL LANGUAGE PROCESSING BASED MACHINE LEARNING MODEL USING CARDIAC MRI REPORTS TO IDENTIFY HYPERTROPHIC CARDIOMYOPATHY PATIENTS
    Sundaram, Divaakar Siva Baala
    Arunachalam, Shivaram P.
    Damani, Devanshi N.
    Farahani, Nasibeh Z.
    Enayati, Moein
    Pasupathy, Kalyan S.
    Arruda-Olson, Adelaide M.
    PROCEEDINGS OF THE 2021 DESIGN OF MEDICAL DEVICES CONFERENCE (DMD2021), 2021,
  • [25] A systematic review of natural language processing applied to radiology reports
    Casey, Arlene
    Davidson, Emma
    Poon, Michael
    Dong, Hang
    Duma, Daniel
    Grivas, Andreas
    Grover, Claire
    Suarez-Paniagua, Victor
    Tobin, Richard
    Whiteley, William
    Wu, Honghan
    Alex, Beatrice
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)
  • [26] Natural language processing of radiology reports for the identification of patients with fracture
    Kolanu, Nithin
    Brown, A. Shane
    Beech, Amanda
    Center, Jacqueline R.
    White, Christopher P.
    ARCHIVES OF OSTEOPOROSIS, 2021, 16 (01)
  • [27] A systematic review of natural language processing applied to radiology reports
    Arlene Casey
    Emma Davidson
    Michael Poon
    Hang Dong
    Daniel Duma
    Andreas Grivas
    Claire Grover
    Víctor Suárez-Paniagua
    Richard Tobin
    William Whiteley
    Honghan Wu
    Beatrice Alex
    BMC Medical Informatics and Decision Making, 21
  • [28] Natural language processing of radiology reports for the identification of patients with fracture
    Nithin Kolanu
    A Shane Brown
    Amanda Beech
    Jacqueline R. Center
    Christopher P. White
    Archives of Osteoporosis, 2021, 16
  • [29] Deep Learning-based Assessment of Oncologic Outcomes from Natural Language Processing of Structured Radiology Reports
    Fink, Matthias A.
    Kades, Klaus
    Bischoff, Arved
    Moll, Martin
    Schnell, Merle
    Kuechler, Maike
    Koehler, Gregor
    Sellner, Jan
    Heussel, Claus Peter
    Kauczor, Hans-Ulrich
    Schlemmer, Heinz-Peter
    Maier-Hein, Klaus
    Weber, Tim F.
    Kleesiek, Jens
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2022, 4 (05)
  • [30] Automated Classification of Radiology Reports for Acute Lung Injury: Comparison of Keyword and Machine Learning Based Natural Language Processing Approaches
    Solti, Imre
    Cooke, Colin R.
    Xia, Fei
    Wurfel, Mark M.
    BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 308 - +