KerMinSVM for imbalanced datasets with a case study on arabic comics classification

被引:5
|
作者
Nayal, Ammar [1 ]
Jomaa, Hadi [1 ]
Awad, Marlette [1 ]
机构
[1] Amer Univ Beirut, Dept Elect & Comp Engn, Beirut, Lebanon
基金
新加坡国家研究基金会;
关键词
Imbalance datasets; Support vector machines; Arabic comics analysis; Natural language processing; Supervised classification;
D O I
10.1016/j.engappai.2017.01.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many studies have been performed to classify large-sized text documents using different classifiers, ranging from simple distance classifiers such as K-Nearest-Neighbor (KNN) to more advanced classifiers such as Support Vector Machines. Traditional approaches fail when a short text is encountered due to sparsity resulting from a limited number of words. Another common problem in text classification is class imbalance (CI). CI occurs when one class of the data contains most of the samples while the other class contains only a few. Standard classifiers, when applied to imbalanced data, result in high accuracy for the majority class and low accuracy for the minority one. We were motivated to propose a novel framework for classifying the content of Arabic comics; therefore, we propose KerMinSVM, a kernel extension of our previously proposed MinSVM coupled with a new dimensionality featuring a reduction scheme based on word root frequency ratios (WRFR). KerMinSVM was tested on multiple imbalanced benchmark datasets, and the results were verified using three measures: accuracy, F-measure, and statistical analysis. WRFR was applied to the manual construction of the Arabic comic text dataset to detect strong content in children's comic books. Test results revealed that our proposed framework outperformed most of the methods for imbalanced datasets and short text classification.
引用
收藏
页码:159 / 169
页数:11
相关论文
共 50 条
  • [41] Robustness of Image Classification on Imbalanced Datasets Using Capsules Networks
    Onana, Steve
    Tchuani, Diane
    Tinku, Claude
    Fippo, Louis
    Kouamou, Georges Edouard
    RESEARCH IN COMPUTER SCIENCE, CRI 2023, 2024, 2085 : 53 - 68
  • [42] Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias
    Haydemar Núñez
    Luis Gonzalez-Abril
    Cecilio Angulo
    Journal of Classification, 2017, 34 : 427 - 443
  • [44] Coping with highly imbalanced datasets: A case study with definition extraction in a multilingual setting
    Del Gaudio, Rosa
    Batista, Gustavo
    Branco, Antonio
    NATURAL LANGUAGE ENGINEERING, 2014, 20 (03) : 327 - 359
  • [45] Study of Multi-Class Classification Algorithms' Performance on Highly Imbalanced Network Intrusion Datasets
    Bulavas, Viktoras
    Marcinkevicius, Virginijus
    Ruminski, Jacek
    INFORMATICA, 2021, 32 (03) : 441 - 475
  • [46] Handling Imbalanced Datasets in the Case of Credit Card Fraud
    Ounacer, Soumaya
    Jihal, Houda
    Bayoude, Kenza
    Daif, Abderrahmane
    Azzouazi, Mohamed
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 666 - 678
  • [47] Weighted Conditional Mutual Information Based Boosting for Classification of Imbalanced Datasets
    Utasi, Akos
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2711 - 2714
  • [48] Granular Classification for Imbalanced Datasets: A Minkowski Distance-Based Method
    Fu, Chen
    Yang, Jianhua
    ALGORITHMS, 2021, 14 (02)
  • [49] Sparse Matrix Classification on Imbalanced Datasets Using Convolutional Neural Networks
    Pichel, Juan C.
    Pateiro-Lopez, Beatriz
    IEEE ACCESS, 2019, 7 : 82377 - 82389
  • [50] Performance of SVM with Multiple Kernel Learning for Classification Tasks of Imbalanced Datasets
    Saeed, Sana
    Ong, Hong Choon
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2019, 27 (01): : 527 - 545