Learning regular expressions for clinical text classification

被引:66
|
作者
Duy Duc An Bui [1 ,2 ]
Zeng-Treitler, Qing [1 ,2 ]
机构
[1] Univ Utah, Dept Biomed Informat, Salt Lake City, UT 84112 USA
[2] VA Salt Lake City Hlth Care Syst, Salt Lake City, UT USA
关键词
RECORDS; SUPPORT;
D O I
10.1136/amiajnl-2013-002411
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objectives Natural language processing (NLP) applications typically use regular expressions that have been developed manually by human experts. Our goal is to automate both the creation and utilization of regular expressions in text classification. Methods We designed a novel regular expression discovery (RED) algorithm and implemented two text classifiers based on RED. The RED+ALIGN classifier combines RED with an alignment algorithm, and RED +SVM combines RED with a support vector machine (SVM) classifier. Two clinical datasets were used for testing and evaluation: the SMOKE dataset, containing 1091 text snippets describing smoking status; and the PAIN dataset, containing 702 snippets describing pain status. We performed 10-fold cross-validation to calculate accuracy, precision, recall, and F-measure metrics. In the evaluation, an SVM classifier was trained as the control. Results The two RED classifiers achieved 80.9-83.0% in overall accuracy on the two datasets, which is 1.3-3% higher than SVM's accuracy (p<0.001). Similarly, small but consistent improvements have been observed in precision, recall, and F-measure when RED classifiers are compared with SVM alone. More significantly, RED+ALIGN correctly classified many instances that were misclassified by the SVM classifier (8.1-10.3% of the total instances and 43.8-53.0% of SVM's misclassifications). Conclusions Machine-generated regular expressions can be effectively used in clinical text classification. The regular expression-based classifier can be combined with other classifiers, like SVM, to improve classification performance.
引用
收藏
页码:850 / 857
页数:8
相关论文
共 50 条
  • [21] Applying active learning to assertion classification of concepts in clinical text
    Chen, Yukun
    Mani, Subramani
    Xu, Hua
    JOURNAL OF BIOMEDICAL INFORMATICS, 2012, 45 (02) : 265 - 272
  • [22] Automated Assessment of Free Text Questions for MOOC Using Regular Expressions
    Sanchez Acosta, Enrique
    Jose Escribano Otero, Juan
    INFORMATION RESOURCES MANAGEMENT JOURNAL, 2014, 27 (02) : 1 - 13
  • [23] CREGEX: A Biomedical Text Classifier Based on Automatically Generated Regular Expressions
    Flores, Christopher A.
    Figueroa, Rosa L.
    Pezoa, Jorge E.
    Zeng-Treitler, Qing
    IEEE ACCESS, 2020, 8 : 29270 - 29280
  • [24] Contrastive learning with text augmentation for text classification
    Jia, Ouyang
    Huang, Huimin
    Ren, Jiaxin
    Xie, Luodi
    Xiao, Yinyin
    APPLIED INTELLIGENCE, 2023, 53 (16) : 19522 - 19531
  • [25] Contrastive learning with text augmentation for text classification
    Ouyang Jia
    Huimin Huang
    Jiaxin Ren
    Luodi Xie
    Yinyin Xiao
    Applied Intelligence, 2023, 53 : 19522 - 19531
  • [26] Clinical Text Classification with Word Representation Features and Machine Learning Algorithms
    Almazaydeh, Laiali
    Abuhelaleh, Mohammed
    Al Tawil, Arar
    Elleithy, Khaled
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2023, 19 (04) : 65 - 76
  • [27] Active learning for clinical text classification: is it better than random sampling?
    Figueroa, Rosa L.
    Zeng-Treitler, Qing
    Ngo, Long H.
    Goryachev, Sergey
    Wiechmann, Eduardo P.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (05) : 809 - 816
  • [28] Enhanced Automatic Feedback Generation for the Learning of Regular Expressions
    Okuboyejo, Olaperi Yeside
    PROCEEDINGS OF THE ANNUAL CONFERENCE OF THE SOUTH AFRICAN INSTITUTE OF COMPUTER SCIENTISTS AND INFORMATION TECHNOLOGISTS (SAICSIT 2018), 2018, : 330 - 330
  • [29] Learning k-Occurrence Regular Expressions with Interleaving
    Li, Yeting
    Zhang, Xiaolan
    Cao, Jialun
    Chen, Haiming
    Gao, Chong
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT II, 2019, 11447 : 70 - 85
  • [30] LEARNING AND MATCHING HUMAN ACTIVITIES USING REGULAR EXPRESSIONS
    Daldoss, M.
    Piotto, N.
    Conci, N.
    De Natale, F. G. B.
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 4681 - 4684