Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach

被引:0
|
作者
Tu, Chaofan [1 ]
Cui, Menglin [1 ]
机构
[1] Univ Nottingham, Sch Comp Sci, Ningbo, Peoples R China
关键词
simulated annealing; regular expression; medical text classification;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a rule-based engine composed of high-quality and interpretable regular expressions for medical text classification. The regular expressions are auto-generated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present high-quality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable "black boxes" to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions. The Pool-based Simulated Annealing method is proposed to automatically optimize the performance of machine-generated regular expressions without human interference. The proposed method is tested on real-life data provided by one of China's largest online medical platforms. Experimental results show that the proposed PSA method further improves the performance of initial machine-generated regular expressions compared with other meta-heuristics such as Genetic Programming. We also believe that the proposed method can serve as a vital complementary tool for the existing machine learning approaches in text classification applications when high levels of interpretability of the solutions are required.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Learning regular expressions for clinical text classification
    Duy Duc An Bui
    Zeng-Treitler, Qing
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (05) : 850 - 857
  • [2] A pool-based simulated annealing approach for preference-aware influence maximisation in social networks
    Liu, Xiaoxue
    Kato, Shohei
    Gu, Wen
    Ren, Fenghui
    Su, Guoxin
    Zhang, Minjie
    KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [3] Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach
    Cui, Menglin
    Bai, Ruibin
    Lu, Zheng
    Li, Xiang
    Aickelin, Uwe
    Ge, Peiming
    IEEE ACCESS, 2019, 7 : 147892 - 147904
  • [4] Active Learning for Biomedical Text Classification Based on Automatically Generated Regular Expressions
    Flores, Christopher A.
    Figueroa, Rosa L.
    Pezoa, Jorge E.
    Flores, Christopher A. (christopher.flores@biomedica.udec.cl), 1600, Institute of Electrical and Electronics Engineers Inc. (09): : 38767 - 38777
  • [5] Active Learning for Biomedical Text Classification Based on Automatically Generated Regular Expressions
    Flores, Christopher A.
    Figueroa, Rosa L.
    Pezoa, Jorge E.
    IEEE ACCESS, 2021, 9 : 38767 - 38777
  • [6] Knowledge Discovery using a new Interpretable Simulated Annealing based Fuzzy Classification System
    Mohamadi, Hamid
    Habibi, Jafar
    Moaven, Shahrouz
    2009 FIRST ASIAN CONFERENCE ON INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2009, : 271 - 276
  • [7] Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming
    Liu, Jiandong
    Bai, Ruibin
    Lu, Zheng
    Ge, Peiming
    Aickelin, Uwe
    Liu, Daoyun
    2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [8] The ATC Work Shift Scheduling Problem Based on Multistart Simulated Annealing and Regular Expressions
    Mateos, A.
    Tello, F.
    Jimenez-Martin, A.
    Fernandez de Pozo, J. A.
    2018 5TH INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT), 2018, : 152 - 157
  • [9] FREGEX: A Feature Extraction Method for Biomedical Text Classification using Regular Expressions
    Flores, Christopher A.
    Figueroa, Rosa L.
    Pezoa, Jorge E.
    2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 6085 - 6088
  • [10] Data Deidentification in Medical Transcriptions using Regular Expressions and Machine Learning
    Seeger, Joshua
    Culotta, Aron
    Keller, Jason
    van Kessel, Patrick
    Jugovich, Michael
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1322 - 1329