Analysis of web clustering based on genetic algorithm with latent semantic indexing technology

被引:2
|
作者
Song, Wei [1 ]
Park, Soon Cheol [1 ]
机构
[1] Chonbuk Natl Univ Korea, Div Elect & Informat Engn, Chonju, South Korea
关键词
D O I
10.1109/ALPIT.2007.77
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper constructed a latent semantic text model using genetic algorithm (GA) for web clustering. The main difficulty in the application of GA for text clustering is thousands or even tens of thousands of dimensions in the feature space. Latent semantic indexing (LSI) is a successful technology which attempts to explore the latent semantics structure in textual data, and furthermore, it reduces this large space to smaller one and provides a robust space for clustering. GA belongs to search techniques that efficiently evolve the optimal solution for the problem. Evolved in the reduced latent semantic indexing model, GA can improve clustering accuracy and speed which is typically suitable for real time clustering. We used SSTRESS criteria to analyze the dissimilarity between original term-by-document corpus matrix and the approximate decomposition matrix with different ranks corresponding to the performance of our algorithm evolved in the reduced space. The superiority of GA applied in LSI model over K-means and conventional GA in the vector space model (VSM) is demonstrated by providing good Reuter text clustering results.
引用
收藏
页码:21 / +
页数:2
相关论文
共 50 条
  • [1] Genetic algorithm for text clustering based on latent semantic indexing
    Song, Wei
    Park, Soon Cheol
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 57 (11-12) : 1901 - 1907
  • [2] The New Clustering Strategy and Algorithm Based on Latent Semantic Indexing
    Yan, Bing
    Du, YaJun
    Li, ZhanShen
    ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 1, PROCEEDINGS, 2008, : 486 - 490
  • [3] Clustering algorithms and latent semantic indexing to identify similar pages in web applications
    De Lucia, Andrea
    Risi, Michele
    Tortora, Genoveffa
    Scanniello, Giuseppe
    WSE 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON WEB SITE EVOLUTION, PROCEEDINGS, 2007, : 65 - +
  • [4] A novel word clustering algorithm based on latent semantic analysis
    Bellegarda, JR
    Butzberger, JW
    Chow, YL
    Coccaro, NB
    Naik, D
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 172 - 175
  • [5] Research On Optimize Technology in Latent Semantic Indexing Based On Semantic Block
    Cai, Dongfeng
    Guo, Dongbo
    Ji, Duo
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 680 - 684
  • [6] Technology classification with latent semantic indexing
    Thorleuchter, Dirk
    Van den Poel, Dirk
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (05) : 1786 - 1795
  • [7] Latent semantic indexing for web service retrieval
    Czyszczoń, Adam (adam.czyszczon@pwr.edu.pl), 1600, Springer Verlag (8733):
  • [8] Latent Semantic Indexing for Web Service Retrieval
    Czyszczon, Adam
    Zgrzywa, Aleksander
    COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, ICCCI 2014, 2014, 8733 : 694 - 702
  • [9] INDEXING BY LATENT SEMANTIC ANALYSIS
    DEERWESTER, S
    DUMAIS, ST
    FURNAS, GW
    LANDAUER, TK
    HARSHMAN, R
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1990, 41 (06): : 391 - 407
  • [10] A Latent Semantic Indexing-based approach to multilingual document clustering
    Wei, Chih-Ping
    Yang, Christopher C.
    Lin, Chia-Min
    DECISION SUPPORT SYSTEMS, 2008, 45 (03) : 606 - 620