Analysis of web clustering based on genetic algorithm with latent semantic indexing technology

被引:2
|
作者
Song, Wei [1 ]
Park, Soon Cheol [1 ]
机构
[1] Chonbuk Natl Univ Korea, Div Elect & Informat Engn, Chonju, South Korea
关键词
D O I
10.1109/ALPIT.2007.77
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper constructed a latent semantic text model using genetic algorithm (GA) for web clustering. The main difficulty in the application of GA for text clustering is thousands or even tens of thousands of dimensions in the feature space. Latent semantic indexing (LSI) is a successful technology which attempts to explore the latent semantics structure in textual data, and furthermore, it reduces this large space to smaller one and provides a robust space for clustering. GA belongs to search techniques that efficiently evolve the optimal solution for the problem. Evolved in the reduced latent semantic indexing model, GA can improve clustering accuracy and speed which is typically suitable for real time clustering. We used SSTRESS criteria to analyze the dissimilarity between original term-by-document corpus matrix and the approximate decomposition matrix with different ranks corresponding to the performance of our algorithm evolved in the reduced space. The superiority of GA applied in LSI model over K-means and conventional GA in the vector space model (VSM) is demonstrated by providing good Reuter text clustering results.
引用
收藏
页码:21 / +
页数:2
相关论文
共 50 条
  • [41] Personal information retrieval based on latent semantic indexing
    Yang, Z
    Deng, GS
    PROCEEDINGS OF 2002 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2002, : 287 - 291
  • [42] Automatic text summarization based on latent semantic indexing
    Ai, Dongmei
    Zheng, Yuchao
    Zhang, Dezheng
    ARTIFICIAL LIFE AND ROBOTICS, 2010, 15 (01) : 25 - 29
  • [43] A semantic clustering algorithm oriented to Web log
    Wu, Chen
    Dai, Jun
    Li, Qi-Feng
    Zhu, Jun-Wu
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1566 - +
  • [44] Web Text Classification Based on Improved Latent Semantic Analysis
    Wang, Lan
    Wan, Yuan
    2011 SECOND ETP/IITA CONFERENCE ON TELECOMMUNICATION AND INFORMATION (TEIN 2011), VOL 1, 2011, : 176 - 179
  • [45] A web recommendation technique based on probabilistic latent semantic analysis
    Xu, GD
    Zhang, YC
    Zhou, XF
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2005, 2005, 3806 : 15 - 28
  • [46] A Fast Approximate Algorithm for Large-Scale Latent Semantic Indexing
    Zhang, Dell
    Zhu, Zheng
    2008 THIRD INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, VOLS 1 AND 2, 2008, : 639 - 644
  • [47] Research on Web Log Data Mining Technology Based on Optimized Clustering Analysis Algorithm
    Wang, Xin
    Xing, Yujuan
    2021 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BLOCKCHAIN TECHNOLOGY (AIBT 2021), 2021, : 6 - 11
  • [48] Classification of Web Resident Sensor Resources using Latent Semantic Indexing and Ontologies
    Majavu, Wabo
    van Zyl, Terence
    Marwala, Tshilidzi
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 518 - +
  • [49] Clustering based rescoring for semantic indexing of multimedia documents
    Hamadi, Abdelkader
    Quenot, Georges
    Mulhem, Philippe
    2013 11TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI 2013), 2013, : 41 - 46
  • [50] The Hierarchical Clustering Analysis of Hyperspectral Image Based on Probabilistic Latent Semantic Analysis
    Yi Wen-bin
    Shen Li
    Qi Yin-feng
    Tang Hong
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2011, 31 (09) : 2471 - 2475