Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming

被引:0
|
作者
Liu, Jiandong [1 ]
Bai, Ruibin [1 ]
Lu, Zheng [1 ]
Ge, Peiming [2 ]
Aickelin, Uwe [3 ]
Liu, Daoyun [2 ]
机构
[1] Univ Nottingham Ningbo China, Sch Comp Sci, Ningbo, Peoples R China
[2] Ping An Hlth Cloud Co Ltd China, Techonol Dept, Shanghai, Peoples R China
[3] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic, Australia
关键词
text classification; genetic programming; co-occurrence matrix; EXPERT-SYSTEM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In medical fields, text classification is one of the most important tasks that can significantly reduce human workload through structured information digitization and intelligent decision support. Despite the popularity of learning-based text classification techniques, it is hard for human to understand or manually fine-tune the classification for better precision and recall, due to the black box nature of learning. This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions that can classify a given medical text inquiry with satisfaction. Given a seed population of regular expressions (randomly initialized or manually constructed by experts), our method evolves a population of regular expressions, using a novel regular expression syntax and a series of carefully chosen reproduction operators. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance. More importantly, our method generates classifiers that can be fully understood, checked and updated by medical doctors, which are fundamentally crucial for medical related practices.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] A Data-driven Affective Text Classification Analysis
    Ardakani, Saeid Pourroostaei
    Zhou, Can
    Wu, Xuting
    Ma, Yingrui
    Che, Jizhou
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 199 - 204
  • [2] Data-Driven Identification of Crane Dynamics Using Regularized Genetic Programming
    Kusznir, Tom
    Smoczek, Jaroslaw
    Karwat, Boleslaw
    APPLIED SCIENCES-BASEL, 2024, 14 (08):
  • [3] Learning regular expressions for clinical text classification
    Duy Duc An Bui
    Zeng-Treitler, Qing
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (05) : 850 - 857
  • [4] Medical Data Classification Using Genetic Programming: A Systematic Literature Review
    Maurya, Pratibha
    Kushwaha, Arati
    Prakash, Om
    EXPERT SYSTEMS, 2025, 42 (03)
  • [5] A DATA-DRIVEN TEXT SIMILARITY MEASURE BASED ON CLASSIFICATION ALGORITHMS
    Cho, Su Gon
    Kim, Seoung Bum
    INTERNATIONAL JOURNAL OF INDUSTRIAL ENGINEERING-THEORY APPLICATIONS AND PRACTICE, 2017, 24 (03): : 328 - 339
  • [6] A data-driven text similarity measure based on classification algorithms
    Kim, Seoung Bum (sbkim1@korea.ac.kr), 1600, University of Cincinnati (24):
  • [7] Text Manipulation Using Regular Expressions
    Biswas, S.
    Sengupta, D.
    Bhattacharjee, R.
    Handique, M.
    2016 IEEE 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (IACC), 2016, : 62 - 67
  • [8] Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach
    Tu, Chaofan
    Cui, Menglin
    2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [9] Upgrades of Genetic Programming for Data-Driven Modeling of Time Series
    Murari, A.
    Peluso, E.
    Spolladore, L.
    Rossi, R.
    Gelfusa, M.
    EVOLUTIONARY COMPUTATION, 2023, 31 (04) : 401 - 432
  • [10] FREGEX: A Feature Extraction Method for Biomedical Text Classification using Regular Expressions
    Flores, Christopher A.
    Figueroa, Rosa L.
    Pezoa, Jorge E.
    2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 6085 - 6088