KerMinSVM for imbalanced datasets with a case study on arabic comics classification

被引：5

作者：

Nayal, Ammar ^{[1
]}

Jomaa, Hadi ^{[1
]}

Awad, Marlette ^{[1
]}

机构：

[1] Amer Univ Beirut, Dept Elect & Comp Engn, Beirut, Lebanon

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2017年 / 59卷

基金：

新加坡国家研究基金会;

关键词：

Imbalance datasets; Support vector machines; Arabic comics analysis; Natural language processing; Supervised classification;

D O I：

10.1016/j.engappai.2017.01.001

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many studies have been performed to classify large-sized text documents using different classifiers, ranging from simple distance classifiers such as K-Nearest-Neighbor (KNN) to more advanced classifiers such as Support Vector Machines. Traditional approaches fail when a short text is encountered due to sparsity resulting from a limited number of words. Another common problem in text classification is class imbalance (CI). CI occurs when one class of the data contains most of the samples while the other class contains only a few. Standard classifiers, when applied to imbalanced data, result in high accuracy for the majority class and low accuracy for the minority one. We were motivated to propose a novel framework for classifying the content of Arabic comics; therefore, we propose KerMinSVM, a kernel extension of our previously proposed MinSVM coupled with a new dimensionality featuring a reduction scheme based on word root frequency ratios (WRFR). KerMinSVM was tested on multiple imbalanced benchmark datasets, and the results were verified using three measures: accuracy, F-measure, and statistical analysis. WRFR was applied to the manual construction of the Arabic comic text dataset to detect strong content in children's comic books. Test results revealed that our proposed framework outperformed most of the methods for imbalanced datasets and short text classification.

引用

页码：159 / 169

页数：11

共 50 条

[41] Robustness of Image Classification on Imbalanced Datasets Using Capsules Networks
Onana, Steve
Tchuani, Diane
Tinku, Claude
Fippo, Louis
Kouamou, Georges Edouard
RESEARCH IN COMPUTER SCIENCE, CRI 2023, 2024, 2085 : 53 - 68
[42] Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias
Haydemar Núñez
Luis Gonzalez-Abril
Cecilio Angulo
Journal of Classification, 2017, 34 : 427 - 443
[43] Coping with highly imbalanced datasets: A case study with definition extraction in a multilingual setting
Del Gaudio, Rosa, 1600, Cambridge University Press (20):
[44] Coping with highly imbalanced datasets: A case study with definition extraction in a multilingual setting
Del Gaudio, Rosa
Batista, Gustavo
Branco, Antonio
NATURAL LANGUAGE ENGINEERING, 2014, 20 (03) : 327 - 359
[45] Study of Multi-Class Classification Algorithms' Performance on Highly Imbalanced Network Intrusion Datasets
Bulavas, Viktoras
Marcinkevicius, Virginijus
Ruminski, Jacek
INFORMATICA, 2021, 32 (03) : 441 - 475
[46] Handling Imbalanced Datasets in the Case of Credit Card Fraud
Ounacer, Soumaya
Jihal, Houda
Bayoude, Kenza
Daif, Abderrahmane
Azzouazi, Mohamed
ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 666 - 678
[47] Weighted Conditional Mutual Information Based Boosting for Classification of Imbalanced Datasets
Utasi, Akos
2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2711 - 2714
[48] Granular Classification for Imbalanced Datasets: A Minkowski Distance-Based Method
Fu, Chen
Yang, Jianhua
ALGORITHMS, 2021, 14 (02)
[49] Sparse Matrix Classification on Imbalanced Datasets Using Convolutional Neural Networks
Pichel, Juan C.
Pateiro-Lopez, Beatriz
IEEE ACCESS, 2019, 7 : 82377 - 82389
[50] Performance of SVM with Multiple Kernel Learning for Classification Tasks of Imbalanced Datasets
Saeed, Sana
Ong, Hong Choon
PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2019, 27 (01): : 527 - 545

← 1 2 3 4 5 →