A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization

被引：0

作者：

Nurfadhlina Mohd Sharef

Trevor Martin

Khairul Azhar Kasmiran

Aida Mustapha

Md. Nasir Sulaiman

Masrah Azrifah Azmi-Murad

机构：

[1] University of Putra Malaysia,Faculty of Computer Science and Information Technology

来源：

Soft Computing | 2015年 / 19卷

关键词：

Text categorization; Text expression; Evolving fuzzy grammar; Machine learning; Incidents; Medical;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Several methods have been studied in text categorization and mostly are inspired by the statistical distribution features in the texts, such as the implementation of Machine Learning (ML) methods. However, there is no work available that investigates the performance of ML-based methods against the text expression-based method, especially for incident and medical case categorization. Meanwhile, these two domains are becoming ever more popular, due to a growing interest of automation in security intelligence and health services. This paper presents a text expression-based method called Evolving Fuzzy Grammar (EFG) and evaluates its performance against the conventional ML methods of Naïve Bayes, support vector machine, k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}-nearest neighbor, adaptive booting, and decision tree. The incident dataset used is a real dataset that was taken from the World Incidents Tracking System, while ImageCLEF 2009 was used as the source for radiology case reports. The results suggested variations of strength and weakness of each method in both categorization tasks, where a standard evaluation technique (i.e., recall, precision, and F\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F$$\end{document}-measure) was used. In both domains, the SMO and IBk methods were the best, while AdaBoost was the worst. It was also observed that the medical dataset was easier to categorize than the incident. Although EFG was ranked second lowest, it obtained the highest precision score in the bombing categorization, the highest score in armed attack recall, and was averagely ranked in the top three for the medical case categorization. It was also noted that the text expression-based method used in EFG was the most verbose and expressive, when compared to the ML methods. This indicates that EFG is a viable method in text categorization and may serve as an alternative approach to such a task.

引用

页码：1701 / 1714

页数：13

共 50 条

[1] A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization
Sharef, Nurfadhlina Mohd
Martin, Trevor
Kasmiran, Khairul Azhar
Mustapha, Aida
Sulaiman, Md Nasir
Azmi-Murad, Masrah Azrifah
SOFT COMPUTING, 2015, 19 (06) : 1701 - 1714
[2] Evolving fuzzy grammar for crime texts categorization
Sharef, Nurfadhlina Mohd
Martin, Trevor
APPLIED SOFT COMPUTING, 2015, 28 : 175 - 187
[3] Machine learning in automated text categorization
Sebastiani, F
ACM COMPUTING SURVEYS, 2002, 34 (01) : 1 - 47
[4] Machine learning for Arabic text categorization
Duwairi, Rehab M.
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (08): : 1005 - 1010
[5] Time Period Categorization in Fiction: A Comparative Analysis of Machine Learning Techniques
Westin, Fereshta
CATALOGING & CLASSIFICATION QUARTERLY, 2024, 62 (02) : 124 - 153
[6] Text Categorization with Machine Learning and Hierarchical Structures
Krendzelak, M.
Jakab, F.
2015 13TH INTERNATIONAL CONFERENCE ON EMERGING ELEARNING TECHNOLOGIES AND APPLICATIONS (ICETA), 2015, : 213 - 217
[7] Machine Learning Methods for Medical Text Categorization
Zhang, Qirui
Tan, Jinghua
Zhou, Huaying
Tao, Weiye
He, Kejing
PROCEEDINGS OF THE 2009 PACIFIC-ASIA CONFERENCE ON CIRCUITS, COMMUNICATIONS AND SYSTEM, 2009, : 494 - +
[8] Machine learning for text categorization: Background and characteristics
Lewis, DD
NATIONAL ONLINE MEETING, PROCEEDINGS 2000, 2000, : 221 - 226
[9] Evolving Fuzzy Optimally Pruned Extreme Learning Machine: A Comparative Analysis
Pouzols, Federico Montesino
Lendasse, Amaury
2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
[10] Comparative Study of Machine Learning Techniques for Boundary Determination of Explanation Knowledge from Text
Pechsiri, Chaveevan
Saint-Dizier, Patrick
Piriyakul, Rapeepun
2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 105 - +

← 1 2 3 4 5 →