Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports

被引:51
|
作者
Ong, Charlene Jennifer [1 ,2 ,3 ,4 ]
Orfanoudaki, Agni [4 ]
Zhang, Rebecca [4 ]
Caprasse, Francois Pierre M. [4 ]
Hutch, Meghan [1 ,2 ]
Ma, Liang [1 ]
Fard, Darian [1 ]
Balogun, Oluwafemi [1 ,2 ]
Miller, Matthew, I [1 ]
Minnig, Margaret [1 ]
Saglam, Hanife [3 ]
Prescott, Brenton [2 ]
Greer, David M. [1 ,2 ]
Smirnakis, Stelios [3 ]
Bertsimas, Dimitris [4 ,5 ]
机构
[1] Boston Univ, Sch Med, Boston, MA 02118 USA
[2] Boston Med Ctr, Boston, MA 02118 USA
[3] Harvard Med Sch, Boston, MA 02115 USA
[4] MIT, Operat Res Ctr, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[5] MIT, Sloan Sch Management, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
PLOS ONE | 2020年 / 15卷 / 06期
关键词
ANNOTATION;
D O I
10.1371/journal.pone.0234908
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in identifying stroke in large datasets, triaging critical clinical reports, and quality improvement efforts. In this study, we developed and report a comprehensive framework studying the performance of simple and complex stroke-specific Natural Language Processing (NLP) and Machine Learning (ML) methods to determine presence, location, and acuity of ischemic stroke from radiographic text. We collected 60,564 Computed Tomography and Magnetic Resonance Imaging Radiology reports from 17,864 patients from two large academic medical centers. We used standard techniques to featurize unstructured text and developed neurovascular specific word GloVe embeddings. We trained various binary classification algorithms to identify stroke presence, location, and acuity using 75% of 1,359 expert-labeled reports. We validated our methods internally on the remaining 25% of reports and externally on 500 radiology reports from an entirely separate academic institution. In our internal population, GloVe word embeddings paired with deep learning (Recurrent Neural Networks) had the best discrimination of all methods for our three tasks (AUCs of 0.96, 0.98, 0.93 respectively). Simpler NLP approaches (Bag of Words) performed best with interpretable algorithms (Logistic Regression) for identifying ischemic stroke (AUC of 0.95), MCA location (AUC 0.96), and acuity (AUC of 0.90). Similarly, GloVe and Recurrent Neural Networks (AUC 0.92, 0.89, 0.93) generalized better in our external test set than BOW and Logistic Regression for stroke presence, location and acuity, respectively (AUC 0.89, 0.86, 0.80). Our study demonstrates a comprehensive assessment of NLP techniques for unstructured radiographic text. Our findings are suggestive that NLP/ML methods can be used to discriminate stroke features from large data cohorts for both clinical and research-related investigations.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] A scoping review of natural language processing of radiology reports in breast cancer
    Saha, Ashirbani
    Burns, Levi
    Kulkarni, Ameya Madhav
    FRONTIERS IN ONCOLOGY, 2023, 13
  • [42] Incidental pulmonary nodules: Natural language processing analysis of radiology reports
    Grolleau, Emmanuel
    Couraud, Sebastien
    Delevaux, Emilien Jupin
    Piegay, Celine
    Mansuy, Adeline
    de Bermont, Julie
    Cotton, Francois
    Pialat, Jean-Baptiste
    Talbot, Francois
    Boussel, Loic
    RESPIRATORY MEDICINE AND RESEARCH, 2024, 86
  • [43] Basic Artificial Intelligence Techniques Natural Language Processing of Radiology Reports
    Steinkamp, Jackson
    Cook, Tessa S.
    RADIOLOGIC CLINICS OF NORTH AMERICA, 2021, 59 (06) : 919 - 931
  • [44] Natural Language Processing for Identification of Incidental Pulmonary Nodules in Radiology Reports
    Kang, Stella K.
    Garry, Kira
    Chung, Ryan
    Moore, William H.
    Iturrate, Eduardo
    Swartz, Jordan L.
    Kim, Danny C.
    Horwitz, Leora, I
    Blecker, Saul
    JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2019, 16 (11) : 1587 - 1594
  • [45] Automated vetting of radiology referrals: exploring natural language processing and traditional machine learning approaches
    Jaka Potočnik
    Edel Thomas
    Ronan Killeen
    Shane Foley
    Aonghus Lawlor
    John Stowe
    Insights into Imaging, 13
  • [46] Automated vetting of radiology referrals: exploring natural language processing and traditional machine learning approaches
    Potocnik, Jaka
    Thomas, Edel
    Killeen, Ronan
    Foley, Shane
    Lawlor, Aonghus
    Stowe, John
    INSIGHTS INTO IMAGING, 2022, 13 (01)
  • [47] Machine Learning and Natural Language Processing for Prediction of Human Factors in Aviation Incident Reports
    Madeira, Tomas
    Melicio, Rui
    Valerio, Duarte
    Santos, Luis
    AEROSPACE, 2021, 8 (02) : 1 - 18
  • [48] Analysis of Stroke Detection during the COVID-19 Pandemic Using Natural Language Processing of Radiology Reports
    Li, M. D.
    Lang, M.
    Deng, F.
    Chang, K.
    Buch, K.
    Rincon, S.
    Mehan, W. A.
    Leslie-Mazwi, T. M.
    Kalpathy-Cramer, J.
    AMERICAN JOURNAL OF NEURORADIOLOGY, 2021, 42 (03) : 429 - 434
  • [49] Machine learning and natural language processing to identify falls in electronic patient care records from ambulance attendances
    Tohira, Hideo
    Finn, Judith
    Ball, Stephen
    Brink, Deon
    Buzzacott, Peter
    INFORMATICS FOR HEALTH & SOCIAL CARE, 2022, 47 (04): : 403 - 413
  • [50] Natural Language Processing and Machine Learning Methods for Software Development Effort Estimation
    Ionescu, Vlad-Sebastian
    Demian, Horia
    Czibula, Istvan-Gergely
    STUDIES IN INFORMATICS AND CONTROL, 2017, 26 (02): : 219 - 228