Text classification using genetic algorithm oriented latent semantic features

被引:72
|
作者
Uysal, Alper Kursat [1 ]
Gunal, Serkan [1 ]
机构
[1] Anadolu Univ, Dept Comp Engn, Eskisehir, Turkey
关键词
Feature selection; Genetic algorithm; Latent semantic indexing; Text classification; FEATURE-SELECTION METHOD; CATEGORIZATION; LSI;
D O I
10.1016/j.eswa.2014.03.041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, genetic algorithm oriented latent semantic features (GALSF) are proposed to obtain better representation of documents in text classification. The proposed approach consists of feature selection and feature transformation stages. The first stage is carried out using the state-of-the-art filter-based methods. The second stage employs latent semantic indexing (LSI) empowered by genetic algorithm such that a better projection is attained using appropriate singular vectors, which are not limited to the ones corresponding to the largest singular values, unlike standard LSI approach. In this way, the singular vectors with small singular values may also be used for projection whereas the vectors with large singular values may be eliminated as well to obtain better discrimination. Experimental results demonstrate that GALSF outperforms both LSI and filter-based feature selection methods on benchmark datasets for various feature dimensions. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:5938 / 5947
页数:10
相关论文
共 50 条
  • [1] Genetic algorithm for text clustering based on latent semantic indexing
    Song, Wei
    Park, Soon Cheol
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 57 (11-12) : 1901 - 1907
  • [2] Improving text classification using local latent semantic indexing
    Liu, T
    Chen, H
    Zhang, BY
    Ma, WY
    Wu, GY
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 162 - 169
  • [3] Evaluation of text semantic features using latent dirichlet allocation model
    Zhou C.
    Li N.
    Zhang C.
    Yang X.
    International Journal of Performability Engineering, 2020, 16 (06) : 968 - 978
  • [4] Boosting for text classification with semantic features
    Bloehdorn, Stephan
    Hotho, Andreas
    ADVANCES IN WEB MINING AND WEB USAGE ANALYSIS, 2006, 3932 : 149 - 166
  • [5] THE APPLICATION OF LATENT SEMANTIC INDEXING AND ONTOLOGY IN TEXT CLASSIFICATION
    Yang, Xi-Quan
    Sun, Na
    Sun, Tie-Li
    Cao, Xue-Ya
    Zheng, Xiao-Juan
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (12A): : 4491 - 4499
  • [6] A neuro-SVM model for text classification using latent semantic indexing
    Mitra, V
    Wang, CJ
    Banerjee, S
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 564 - 569
  • [7] Text Classification using Gated Fusion of n-gram Features and Semantic Features
    Nagar, Ajay
    Bhasin, Anmol
    Mathur, Gaurav
    COMPUTACION Y SISTEMAS, 2019, 23 (03): : 1015 - 1020
  • [8] Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus
    A. Kabán
    M. A. Girolami
    Neural Processing Letters, 2002, 15 : 31 - 43
  • [9] Fast extraction of semantic features from a latent semantic indexed text corpus
    Kabán, A
    Girolami, MA
    NEURAL PROCESSING LETTERS, 2002, 15 (01) : 31 - 34
  • [10] Transductive learning for short-text classification problems using latent semantic indexing
    Zelikovitz, S
    Marquez, F
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2005, 19 (02) : 143 - 163