Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach

被引:0
|
作者
Tu, Chaofan [1 ]
Cui, Menglin [1 ]
机构
[1] Univ Nottingham, Sch Comp Sci, Ningbo, Peoples R China
关键词
simulated annealing; regular expression; medical text classification;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a rule-based engine composed of high-quality and interpretable regular expressions for medical text classification. The regular expressions are auto-generated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present high-quality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable "black boxes" to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions. The Pool-based Simulated Annealing method is proposed to automatically optimize the performance of machine-generated regular expressions without human interference. The proposed method is tested on real-life data provided by one of China's largest online medical platforms. Experimental results show that the proposed PSA method further improves the performance of initial machine-generated regular expressions compared with other meta-heuristics such as Genetic Programming. We also believe that the proposed method can serve as a vital complementary tool for the existing machine learning approaches in text classification applications when high levels of interpretability of the solutions are required.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Behavior Monitoring Using Learning Techniques and Regular-Expressions-Based Pattern Matching
    Shin, Hyo-Sang
    Turchi, Dorio
    He, Shaoming
    Tsourdos, Antonios
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2019, 20 (04) : 1289 - 1302
  • [32] Novel Machine Learning-Based Approach for Arabic Text Classification Using Stylistic and Semantic Features
    Fkih, Fethi
    Alsuhaibani, Mohammed
    Rhouma, Delel
    Qamar, Ali Mustafa
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 5871 - 5886
  • [33] Simulated annealing based learning approach for the design of cascade architectures of fuzzy neural networks
    Han, Chang-Wook
    Park, Jung-Il
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 798 - 803
  • [34] Interpretable deep learning approach for oral cancer classification using guided attention inference network
    Figueroa, Kevin Chew
    Song, Bofan
    Sunny, Sumsum
    Li, Shaobai
    Gurushanth, Keerthi
    Mendonca, Pramila
    Mukhia, Nirza
    Patrick, Sanjana
    Gurudath, Shubha
    Raghavan, Subhashini
    Imchen, Tsusennaro
    Leivon, Shirley T.
    Kolur, Trupti
    Shetty, Vivek
    Bushan, Vidya
    Ramesh, Rohan
    Pillai, Vijay
    Wilder-Smith, Petra
    Sigamani, Alben
    Suresh, Amritha
    Kuriakose, Moni Abraham
    Birur, Praveen
    Liang, Rongguang
    JOURNAL OF BIOMEDICAL OPTICS, 2022, 27 (01)
  • [35] Robot Path Planning in Dynamic Environments Using a Simulated Annealing Based Approach
    Miao, Hui
    Tian, Yu-Chu
    2008 10TH INTERNATIONAL CONFERENCE ON CONTROL AUTOMATION ROBOTICS & VISION: ICARV 2008, VOLS 1-4, 2008, : 1253 - 1258
  • [36] Segmentation of medical images using Simulated Annealing Based Fuzzy C Means algorithm
    Sharma, Neeraj
    Ray, Amit K.
    Sharma, Shiru
    Shukla, K. K.
    Aggarwal, Lalit M.
    Pradhan, Satyajit
    INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2009, 2 (03) : 260 - 278
  • [37] Text Classification Model in Chinese Electronic Medical Records Using Machine Learning Methods
    Zhang, Ping
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 123 - 123
  • [38] DOMAIN SPECIFIC SYNTAX BASED APPROACH FOR TEXT CLASSIFICATION IN MACHINE LEARNING CONTEXT
    Mohasseb, Alaa
    Bader-El-Den, Mohamed
    Liu, Han
    Cocea, Mihaela
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2017, : 652 - 657
  • [39] Transfer Learning to Timed Text Based Video Classification Using CNN
    Kastrati, Zenun
    Imran, Ali Shariq
    Kurti, Arianit
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, MINING AND SEMANTICS (WIMS 2019), 2019,
  • [40] A keyphrase-based approach for interpretable ICD-10 code classification of Spanish medical reports
    Duque, Andres
    Fabregat, Hermenegildo
    Araujo, Lourdes
    Martinez-Romo, Juan
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2021, 121