A parallel hybrid web document clustering algorithm and its performance study

被引:16
|
作者
Xu, ST [1 ]
Zhang, J [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lab High Performance Sci Comp & Comp Simulat, Lexington, KY 40506 USA
来源
JOURNAL OF SUPERCOMPUTING | 2004年 / 30卷 / 02期
关键词
information retrieval; parallel document clustering; PDDP; K-means;
D O I
10.1023/B:SUPE.0000040611.25862.d9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering web document is an important procedure in many web information retrieval systems. As the size of the Internet grows rapidly and the amount of information requests increases exponentially, the use of parallel computing techniques in large scale web document retrieval is unavoidable. We propose a parallel hybrid web document clustering algorithm, which combines the Principal Direction Divisive Partitioning (PDDP) algorithm with the K-means algorithm. Computational experiments were conducted to test the performance of the hybrid algorithm using three real life web document datasets, and the results were compared with that of the parallel PDDP algorithm and the parallel K-means algorithm. The experiments show that the quality of the clustering solutions obtained from the hybrid algorithm is better than that from the parallel PDDP or the parallel K-means. The parallel run time of the hybrid algorithm is similar to and sometimes less than that of the widely used K-means algorithm.
引用
收藏
页码:117 / 131
页数:15
相关论文
共 50 条
  • [21] A Novel Hybrid Clustering Approach Based on Black Hole Algorithm for Document Clustering
    Malik, Fazila
    Khan, Salabat
    Rizwan, Atif
    Atteia, Ghada
    Samee, Nagwan Abdel
    IEEE Access, 2022, 10 : 97310 - 97326
  • [22] AN EFFECTIVE FUZZY CLUSTERING ALGORITHM FOR WEB DOCUMENT CLASSIFICATION: A CASE STUDY IN CULTURAL CONTENT MINING
    Tsekouras, George E.
    Gavalas, Damianos
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2013, 23 (06) : 869 - 886
  • [23] An Improved Hierarchical K-Means Algorithm for Web Document Clustering
    Liu, Yongxin
    Liu, Zhijng
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 606 - 610
  • [24] Web document classification and its performance evaluation
    Pop, Ioan
    ADVANCED TOPICS ON EVOLUTIONARY COMPUTING, 2008, : 105 - 110
  • [25] Improved Ant Colony Clustering Algorithm and Its Performance Study
    Gao, Wei
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2016, 2016
  • [26] Study on parallel hybrid evolutionary algorithm and its application in RBF networks
    Zhu, Y
    Zhang, N
    7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VI, PROCEEDINGS: INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS: I, 2003, : 327 - 332
  • [27] Parallel complete gradient clustering algorithm and its properties
    Kowalski, Piotr A.
    Jeczmionek, Ernest
    INFORMATION SCIENCES, 2022, 600 : 155 - 169
  • [28] A hybrid approach for text document clustering using Jaya optimization algorithm
    Thirumoorthy, Karpagalingam
    Muneeswaran, Karuppaiah
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178
  • [29] Novel Hybrid Document Clustering Algorithm Based on Ant Colony and Agglomerate
    Wang, Xiaohua
    Shen, Jie
    Tang, Hongjun
    2009 SECOND INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING: KAM 2009, VOL 3, 2009, : 65 - 68
  • [30] Research of text clustering based on hybrid Parallel Genetic Algorithm
    Dai, Wenhua
    Rao, Guizhen
    He, Tingting
    PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, PROCEEDINGS, 2007, : 28 - 31