A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization

被引:0
|
作者
Nurfadhlina Mohd Sharef
Trevor Martin
Khairul Azhar Kasmiran
Aida Mustapha
Md. Nasir Sulaiman
Masrah Azrifah Azmi-Murad
机构
[1] University of Putra Malaysia,Faculty of Computer Science and Information Technology
来源
Soft Computing | 2015年 / 19卷
关键词
Text categorization; Text expression; Evolving fuzzy grammar; Machine learning; Incidents; Medical;
D O I
暂无
中图分类号
学科分类号
摘要
Several methods have been studied in text categorization and mostly are inspired by the statistical distribution features in the texts, such as the implementation of Machine Learning (ML) methods. However, there is no work available that investigates the performance of ML-based methods against the text expression-based method, especially for incident and medical case categorization. Meanwhile, these two domains are becoming ever more popular, due to a growing interest of automation in security intelligence and health services. This paper presents a text expression-based method called Evolving Fuzzy Grammar (EFG) and evaluates its performance against the conventional ML methods of Naïve Bayes, support vector machine, k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}-nearest neighbor, adaptive booting, and decision tree. The incident dataset used is a real dataset that was taken from the World Incidents Tracking System, while ImageCLEF 2009 was used as the source for radiology case reports. The results suggested variations of strength and weakness of each method in both categorization tasks, where a standard evaluation technique (i.e., recall, precision, and F\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F$$\end{document}-measure) was used. In both domains, the SMO and IBk methods were the best, while AdaBoost was the worst. It was also observed that the medical dataset was easier to categorize than the incident. Although EFG was ranked second lowest, it obtained the highest precision score in the bombing categorization, the highest score in armed attack recall, and was averagely ranked in the top three for the medical case categorization. It was also noted that the text expression-based method used in EFG was the most verbose and expressive, when compared to the ML methods. This indicates that EFG is a viable method in text categorization and may serve as an alternative approach to such a task.
引用
收藏
页码:1701 / 1714
页数:13
相关论文
共 50 条
  • [11] Text Classification: How Machine Learning Is Revolutionizing Text Categorization
    Allam, Hesham
    Makubvure, Lisa
    Gyamfi, Benjamin
    Graham, Kwadwo Nyarko
    Akinwolere, Kehinde
    INFORMATION, 2025, 16 (02)
  • [12] A comparative study on supervised and unsupervised learning approaches for multilingual text categorization
    Lee, Chung-Hong
    Yang, Hsin-Chang
    Chen, Ting-Chung
    Ma, Sheng-Min
    ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 2, PROCEEDINGS, 2006, : 511 - +
  • [13] A comparative study on text representation schemes in text categorization
    Song, FX
    Liu, SH
    Yang, JY
    PATTERN ANALYSIS AND APPLICATIONS, 2005, 8 (1-2) : 199 - 209
  • [14] A comparative study on text representation schemes in text categorization
    Fengxi Song
    Shuhai Liu
    Jingyu Yang
    Pattern Analysis and Applications, 2005, 8 : 199 - 209
  • [15] Machine Learning in Evolving Connectionist Text Summarizer
    Prasad, Rajesh S.
    Kulkarni, U. V.
    Prasad, Jayashree R.
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION IN COMMUNICATION, 2009, : 539 - +
  • [16] A Comparative Study of Machine Learning Techniques in Healthcare
    Jain, Divik
    Kadecha, Brijesh
    Iyer, Sailesh
    PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 455 - 460
  • [17] Arabic Text Categorization using Machine Learning Approaches
    Alshammari, Riyad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (03) : 226 - 230
  • [18] Text categorization based on regularization extreme learning machine
    Zheng, Wenbin
    Qian, Yuntao
    Lu, Huijuan
    NEURAL COMPUTING & APPLICATIONS, 2013, 22 (3-4): : 447 - 456
  • [19] Text categorization based on regularization extreme learning machine
    Wenbin Zheng
    Yuntao Qian
    Huijuan Lu
    Neural Computing and Applications, 2013, 22 : 447 - 456
  • [20] Text Fragment Extraction using Incremental Evolving Fuzzy Grammar Fragments Learner
    Sharef, Nurfadhlina Mohd
    Shen, Yun
    2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,