Text classification using genetic algorithm oriented latent semantic features

被引:72
|
作者
Uysal, Alper Kursat [1 ]
Gunal, Serkan [1 ]
机构
[1] Anadolu Univ, Dept Comp Engn, Eskisehir, Turkey
关键词
Feature selection; Genetic algorithm; Latent semantic indexing; Text classification; FEATURE-SELECTION METHOD; CATEGORIZATION; LSI;
D O I
10.1016/j.eswa.2014.03.041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, genetic algorithm oriented latent semantic features (GALSF) are proposed to obtain better representation of documents in text classification. The proposed approach consists of feature selection and feature transformation stages. The first stage is carried out using the state-of-the-art filter-based methods. The second stage employs latent semantic indexing (LSI) empowered by genetic algorithm such that a better projection is attained using appropriate singular vectors, which are not limited to the ones corresponding to the largest singular values, unlike standard LSI approach. In this way, the singular vectors with small singular values may also be used for projection whereas the vectors with large singular values may be eliminated as well to obtain better discrimination. Experimental results demonstrate that GALSF outperforms both LSI and filter-based feature selection methods on benchmark datasets for various feature dimensions. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:5938 / 5947
页数:10
相关论文
共 50 条
  • [41] Latent Semantic Analysis: An Approach to Understand Semantic of Text
    Kherwa, Pooja
    Bansal, Poonam
    2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 870 - 874
  • [42] A Text Classification Model via Multi-Level Semantic Features
    Mao, Keji
    Xu, Jinyu
    Yao, Xingda
    Qiu, Jiefan
    Chi, Kaikai
    Dai, Guanglin
    SYMMETRY-BASEL, 2022, 14 (09):
  • [43] Text segmentation by latent semantic indexing
    Ishioka, T
    NEW DEVELOPMENTS IN PSYCHOMETRICS, 2003, : 689 - 696
  • [44] Latent semantic analysis for text segmentation
    Choi, FYY
    Wiemer-Hastings, P
    Moore, J
    PROCEEDINGS OF THE 2001 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2001, : 109 - 117
  • [45] Using Semantic Correlation of HowNet for Short Text Classification
    Ning, Yahui
    Zhang, Li
    Ju, Yarong
    Wang, Weijia
    Li, Shunqin
    APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 1931 - 1934
  • [46] Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures
    Song, Wei
    Li, Cheng Hua
    Park, Soon Cheol
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (05) : 9095 - 9104
  • [47] Text classification using small number of features
    Makrehchi, M
    Kamel, MS
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2005, 3587 : 580 - 589
  • [48] Novel Machine Learning-Based Approach for Arabic Text Classification Using Stylistic and Semantic Features
    Fkih, Fethi
    Alsuhaibani, Mohammed
    Rhouma, Delel
    Qamar, Ali Mustafa
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 5871 - 5886
  • [49] Text mining using nonnegative matrix factorization and latent semantic analysis
    Hassani, Ali
    Iranmanesh, Amir
    Mansouri, Najme
    Neural Computing and Applications, 2021, 33 (20) : 13745 - 13766
  • [50] Text mining using nonnegative matrix factorization and latent semantic analysis
    Ali Hassani
    Amir Iranmanesh
    Najme Mansouri
    Neural Computing and Applications, 2021, 33 : 13745 - 13766