Analysis of web clustering based on genetic algorithm with latent semantic indexing technology

被引:2
|
作者
Song, Wei [1 ]
Park, Soon Cheol [1 ]
机构
[1] Chonbuk Natl Univ Korea, Div Elect & Informat Engn, Chonju, South Korea
关键词
D O I
10.1109/ALPIT.2007.77
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper constructed a latent semantic text model using genetic algorithm (GA) for web clustering. The main difficulty in the application of GA for text clustering is thousands or even tens of thousands of dimensions in the feature space. Latent semantic indexing (LSI) is a successful technology which attempts to explore the latent semantics structure in textual data, and furthermore, it reduces this large space to smaller one and provides a robust space for clustering. GA belongs to search techniques that efficiently evolve the optimal solution for the problem. Evolved in the reduced latent semantic indexing model, GA can improve clustering accuracy and speed which is typically suitable for real time clustering. We used SSTRESS criteria to analyze the dissimilarity between original term-by-document corpus matrix and the approximate decomposition matrix with different ranks corresponding to the performance of our algorithm evolved in the reduced space. The superiority of GA applied in LSI model over K-means and conventional GA in the vector space model (VSM) is demonstrated by providing good Reuter text clustering results.
引用
收藏
页码:21 / +
页数:2
相关论文
共 50 条
  • [21] Tag clustering algorithm LMMSK: improved K-means algorithm based on latent semantic analysis
    Yang, Jing
    Wang, Jun
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2017, 28 (02) : 374 - 384
  • [22] Tag clustering algorithm LMMSK: improved K-means algorithm based on latent semantic analysis
    Jing Yang
    Jun Wang
    Journal of Systems Engineering and Electronics, 2017, 28 (02) : 374 - 384
  • [23] LATENT SEMANTIC INDEXING USING MULTIRESOLUTION ANALYSIS
    Jaber, Tareq
    Amira, Abbes
    Milligan, Peter
    PECCS 2011: PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON PERVASIVE AND EMBEDDED COMPUTING AND COMMUNICATION SYSTEMS, 2011, : 327 - 332
  • [24] Optimizing K-Means Text Document Clustering Using Latent Semantic Indexing and Pillar Algorithm
    Adinugroho, Sigit
    Sari, Yuita Arum
    Fauzi, M. Ali
    Adikara, Putra Pandu
    2017 5TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2017, : 81 - 85
  • [25] Spam filtering based on latent semantic indexing
    Gansterer, Wilfried N.
    Janecek, Andreas G. K.
    Neumayer, Robert
    SURVEY OF TEXT MINING II: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2008, : 165 - +
  • [26] Web text categorization based on latent semantic analysis
    Wang Jianfeng
    Yuan Jinsha
    ICCSE'2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION: ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, 2006, : 826 - 828
  • [27] COMPARISON OF LATENT SEMANTIC ANALYSIS AND PROBABILISTIC LATENT SEMANTIC ANALYSIS FOR DOCUMENTS CLUSTERING
    Kuta, Marcin
    Kitowski, Jacek
    COMPUTING AND INFORMATICS, 2014, 33 (03) : 652 - 666
  • [28] Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering
    Song, Wei
    Park, Soon Cheol
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 22 (03) : 347 - 369
  • [29] Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering
    Wei Song
    Soon Cheol Park
    Knowledge and Information Systems, 2010, 22 : 347 - 369
  • [30] Vantage Point Latent Semantic Indexing for multimedia web document search
    D. Srikanth
    S. Sakthivel
    Cluster Computing, 2019, 22 : 10587 - 10594