Hybrid Feature Selection for Amharic News Document Classification

被引:4
|
作者
Endalie, Demeke [1 ]
Haile, Getamesay [1 ]
机构
[1] Jimma Inst Technol, Fac Comp & Informat, Jimma, Ethiopia
关键词
Text processing - Feature Selection - Information retrieval systems;
D O I
10.1155/2021/5516262
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Today, the amount of Amharic digital documents has grown rapidly. Because of this, automatic text classification is extremely important. Proper selection of features has a crucial role in the accuracy of classification and computational time. When the initial feature set is considerably larger, it is important to pick the right features. In this paper, we present a hybrid feature selection method, called IGCHIDF, which consists of information gain (IG), chi-square (CHI), and document frequency (DF) features' selection methods. We evaluate the proposed feature selection method on two datasets: dataset 1 containing 9 news categories and dataset 2 containing 13 news categories. Our experimental results showed that the proposed method performs better than other methods on both datasets land 2. The IGCHIDF method's classification accuracy is up to 3.96% higher than the IG method, up to 11.16% higher than CHI, and 7.3% higher than DF on dataset 2, respectively.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Protein Classification Using Hybrid Feature Selection Technique
    Singh, Upendra
    Tripathi, Sudhakar
    SMART TRENDS IN INFORMATION TECHNOLOGY AND COMPUTER COMMUNICATIONS, SMARTCOM 2016, 2016, 628 : 813 - 821
  • [32] Hybrid feature selection model for classification of lung disorders
    Vivekanandan Dharmalingam
    Dhananjay Kumar
    Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 5609 - 5625
  • [33] Hybrid feature selection model for classification of lung disorders
    Dharmalingam, Vivekanandan
    Kumar, Dhananjay
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 13 (12) : 5609 - 5625
  • [34] Traditional and Swarm Intelligent Based Text Feature Selection for Document Classification
    Kyaw, Khin Sandar
    Limsiroratana, Somchai
    ISCIT 2019: PROCEEDINGS OF 2019 19TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2019, : 226 - 231
  • [35] On the Feature Selection and Classification Based on Information Gain for Document Sentiment Analysis
    Pratiwi, Asriyanti Indah
    Adiwijaya
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2018, 2018
  • [36] An improved document classification approach with maximum entropy and entropy feature selection
    Pang, Xiu-Li
    Feng, Yu-Qiang
    Jiang, Wei
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3911 - +
  • [37] Fine-Tuning BERT Models for Multiclass Amharic News Document Categorization
    Endalie, Demeke
    COMPLEXITY, 2025, 2025 (01)
  • [38] FE-TAC: an effective document classification method combining feature extraction and feature selection
    Singh K.N.
    Devi H.M.
    Mahant A.K.
    Dorendro A.
    International Journal of Applied Decision Sciences, 2023, 16 (06) : 717 - 740
  • [39] Feature selection and document clustering
    Dhillon, I
    Kogan, J
    Nicholas, C
    SURVEY OF TEXT MINING: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2004, : 73 - 100
  • [40] Hybrid Feature Selection Based on Principal Component Analysis and Grey Wolf Optimizer Algorithm for Arabic News Article Classification
    Alomari, Osama Ahmad
    Elnagar, Ashraf
    Afyouni, Imad
    Shahin, Ismail
    Nassif, Ali Bou
    Hashem, Ibrahim Abaker
    Tubishat, Mohammad
    IEEE ACCESS, 2022, 10 : 121816 - 121830