A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization

被引:0
|
作者
Nurfadhlina Mohd Sharef
Trevor Martin
Khairul Azhar Kasmiran
Aida Mustapha
Md. Nasir Sulaiman
Masrah Azrifah Azmi-Murad
机构
[1] University of Putra Malaysia,Faculty of Computer Science and Information Technology
来源
Soft Computing | 2015年 / 19卷
关键词
Text categorization; Text expression; Evolving fuzzy grammar; Machine learning; Incidents; Medical;
D O I
暂无
中图分类号
学科分类号
摘要
Several methods have been studied in text categorization and mostly are inspired by the statistical distribution features in the texts, such as the implementation of Machine Learning (ML) methods. However, there is no work available that investigates the performance of ML-based methods against the text expression-based method, especially for incident and medical case categorization. Meanwhile, these two domains are becoming ever more popular, due to a growing interest of automation in security intelligence and health services. This paper presents a text expression-based method called Evolving Fuzzy Grammar (EFG) and evaluates its performance against the conventional ML methods of Naïve Bayes, support vector machine, k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}-nearest neighbor, adaptive booting, and decision tree. The incident dataset used is a real dataset that was taken from the World Incidents Tracking System, while ImageCLEF 2009 was used as the source for radiology case reports. The results suggested variations of strength and weakness of each method in both categorization tasks, where a standard evaluation technique (i.e., recall, precision, and F\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F$$\end{document}-measure) was used. In both domains, the SMO and IBk methods were the best, while AdaBoost was the worst. It was also observed that the medical dataset was easier to categorize than the incident. Although EFG was ranked second lowest, it obtained the highest precision score in the bombing categorization, the highest score in armed attack recall, and was averagely ranked in the top three for the medical case categorization. It was also noted that the text expression-based method used in EFG was the most verbose and expressive, when compared to the ML methods. This indicates that EFG is a viable method in text categorization and may serve as an alternative approach to such a task.
引用
收藏
页码:1701 / 1714
页数:13
相关论文
共 50 条
  • [41] A pipeline and comparative study of 12 machine learning models for text classification
    Occhipinti, Annalisa
    Rogers, Louis
    Angione, Claudio
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 201
  • [42] Machine-Learning Techniques for Customer Retention: A Comparative Study
    Sabbeh, Sahar F.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (02) : 273 - 281
  • [43] Machine learning techniques for software vulnerability prediction: a comparative study
    Gul Jabeen
    Sabit Rahim
    Wasif Afzal
    Dawar Khan
    Aftab Ahmed Khan
    Zahid Hussain
    Tehmina Bibi
    Applied Intelligence, 2022, 52 : 17614 - 17635
  • [44] Comparative study on sentimental analysis using machine learning techniques
    Enduri, Murali Krishna
    Sangi, Abdur Rashid
    Anamalamudi, Satish
    Manikanta, R. Chandu Badrinath
    Reddy, K. Yogeshvar
    Yeswanth, P. Lovely
    Reddy, S. Kiran Sai
    Karthikeya, Asish
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2023, 42 (01) : 207 - 215
  • [45] A Comparative Study of Machine Learning Techniques for Automatic Product Categorisation
    Chavaltada, Chanawee
    Pasupa, Kitsuchart
    Hardoon, David R.
    ADVANCES IN NEURAL NETWORKS, PT I, 2017, 10261 : 10 - 17
  • [46] Comparative study of supervised machine learning techniques for intrusion detection
    Gharibian, Farnaz
    Ghorbani, Ali A.
    CNSR 2007: PROCEEDINGS OF THE FIFTH ANNUAL CONFERENCE ON COMMUNICATION NETWORKS AND SERVICES RESEARCH, 2007, : 350 - +
  • [47] A Comparative Study of Machine Learning Techniques for Nuanced Weather Prediction
    Gangula, Prashanth Reddy
    Yeboah, Jones
    Nti, Isaac Kofi
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 260 - 265
  • [48] Machine learning techniques for software vulnerability prediction: a comparative study
    Jabeen, Gul
    Rahim, Sabit
    Afzal, Wasif
    Khan, Dawar
    Khan, Aftab Ahmed
    Hussain, Zahid
    Bibi, Tehmina
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17614 - 17635
  • [49] Categorization of Mouse Ultrasonic Vocalizations Using Machine Learning Techniques
    Kouzoupis, Spyros
    Neocleous, Andreas
    Athanassakis, Irene
    ACOUSTICS, 2019, 1 (04): : 837 - 846
  • [50] Evaluation of Machine Learning Techniques for Motivational Quotes Classification and Categorization
    Kapuria, Adhiveer
    Bhavsar, Parth
    Kejriwal, Nishant
    ADVANCES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, 2024, 4 (03): : 2746 - 2763