Text classification using genetic algorithm oriented latent semantic features

被引:72
|
作者
Uysal, Alper Kursat [1 ]
Gunal, Serkan [1 ]
机构
[1] Anadolu Univ, Dept Comp Engn, Eskisehir, Turkey
关键词
Feature selection; Genetic algorithm; Latent semantic indexing; Text classification; FEATURE-SELECTION METHOD; CATEGORIZATION; LSI;
D O I
10.1016/j.eswa.2014.03.041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, genetic algorithm oriented latent semantic features (GALSF) are proposed to obtain better representation of documents in text classification. The proposed approach consists of feature selection and feature transformation stages. The first stage is carried out using the state-of-the-art filter-based methods. The second stage employs latent semantic indexing (LSI) empowered by genetic algorithm such that a better projection is attained using appropriate singular vectors, which are not limited to the ones corresponding to the largest singular values, unlike standard LSI approach. In this way, the singular vectors with small singular values may also be used for projection whereas the vectors with large singular values may be eliminated as well to obtain better discrimination. Experimental results demonstrate that GALSF outperforms both LSI and filter-based feature selection methods on benchmark datasets for various feature dimensions. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:5938 / 5947
页数:10
相关论文
共 50 条
  • [21] Automatic Text Summarization Using Latent Semantic Analysis
    Mashechkin, I. V.
    Petrovskiy, M. I.
    Popov, D. S.
    Tsarev, D. V.
    PROGRAMMING AND COMPUTER SOFTWARE, 2011, 37 (06) : 299 - 305
  • [22] Evaluating the utility of statistical phrases and latent semantic indexing for text classification
    Wu, HW
    Gunopulos, D
    2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 713 - 716
  • [23] Automatic text summarization using latent semantic analysis
    I. V. Mashechkin
    M. I. Petrovskiy
    D. S. Popov
    D. V. Tsarev
    Programming and Computer Software, 2011, 37 : 299 - 305
  • [24] KANNADA TEXT SUMMARIZATION USING LATENT SEMANTIC ANALYSIS
    Geetha, J. K.
    Deepamala, N.
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1508 - 1512
  • [25] Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders
    Alkhatib, Wael
    Rensing, Christoph
    Silberbauer, Johannes
    LANGUAGE, DATA, AND KNOWLEDGE, LDK 2017, 2017, 10318 : 380 - 394
  • [26] Learning Semantic Text Features for Web Text-Aided Image Classification
    Wang, Dongzhe
    Mao, Kezhi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (12) : 2985 - 2996
  • [27] Mammogram Tumor Classification using Multimodal Features and Genetic Algorithm
    Suganthi, M.
    Madheswaran, M.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, COMMUNICATION AND ENERGY CONSERVATION INCACEC 2009 VOL 1, 2009, : 190 - 195
  • [28] A Genetic Algorithm for Text Classification Rule Induction
    Pietramala, Adriana
    Policicchio, Veronica L.
    Rullo, Pasquale
    Sidhu, Inderbir
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS, 2008, 5212 : 188 - +
  • [29] Practical study on the semantic analysis algorithm based on text classification
    Deng, Lei
    Tao, Xingzhen
    Tian, Gaohua
    INTERNATIONAL CONFERENCE ON ALGORITHMS, HIGH PERFORMANCE COMPUTING, AND ARTIFICIAL INTELLIGENCE (AHPCAI 2021), 2021, 12156
  • [30] Latent semantic text classification method research based on support vector machine
    Lu Q.
    Wang Y.
    International Journal of Information and Communication Technology, 2019, 15 (03) : 243 - 255