A parallel hybrid web document clustering algorithm and its performance study

被引:16
|
作者
Xu, ST [1 ]
Zhang, J [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lab High Performance Sci Comp & Comp Simulat, Lexington, KY 40506 USA
来源
JOURNAL OF SUPERCOMPUTING | 2004年 / 30卷 / 02期
关键词
information retrieval; parallel document clustering; PDDP; K-means;
D O I
10.1023/B:SUPE.0000040611.25862.d9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering web document is an important procedure in many web information retrieval systems. As the size of the Internet grows rapidly and the amount of information requests increases exponentially, the use of parallel computing techniques in large scale web document retrieval is unavoidable. We propose a parallel hybrid web document clustering algorithm, which combines the Principal Direction Divisive Partitioning (PDDP) algorithm with the K-means algorithm. Computational experiments were conducted to test the performance of the hybrid algorithm using three real life web document datasets, and the results were compared with that of the parallel PDDP algorithm and the parallel K-means algorithm. The experiments show that the quality of the clustering solutions obtained from the hybrid algorithm is better than that from the parallel PDDP or the parallel K-means. The parallel run time of the hybrid algorithm is similar to and sometimes less than that of the widely used K-means algorithm.
引用
收藏
页码:117 / 131
页数:15
相关论文
共 50 条
  • [1] A Parallel Hybrid Web Document Clustering Algorithm and its Performance Study
    Shuting Xu
    Jun Zhang
    The Journal of Supercomputing, 2004, 30 : 117 - 131
  • [2] An improved clustering algorithm for web document
    Wang, Jing
    Liu, Zhijing
    Journal of Information and Computational Science, 2009, 6 (02): : 959 - 966
  • [3] A parallel clustering algorithm on the star graph and its performance
    Sarbazi-Azad, Hamid
    Zarandi, Hamid R.
    Fazeli, Mahdi
    MATHEMATICAL AND COMPUTER MODELLING, 2013, 58 (3-4) : 880 - 891
  • [4] Hybrid Neural Network Model for Web Document Clustering
    Hemalatha, M.
    Srinivas, Sathya D.
    2009 SECOND INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009), 2009, : 531 - +
  • [5] Web document clustering using a hybrid neural network
    Khan, MS
    Khor, SW
    APPLIED SOFT COMPUTING, 2004, 4 (04) : 423 - 432
  • [6] A parallel text document clustering algorithm based on neighbors
    Li, Yanjun
    Luo, Congnan
    Chung, Soon M.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (02): : 933 - 948
  • [7] A parallel text document clustering algorithm based on neighbors
    Yanjun Li
    Congnan Luo
    Soon M. Chung
    Cluster Computing, 2015, 18 : 933 - 948
  • [8] A fuzzy-based algorithm for Web document clustering
    Friedman, M
    Kandel, A
    Schneider, M
    Last, M
    Shapira, B
    Elovici, Y
    Zaafrany, O
    NAFIPS 2004: ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, VOLS 1AND 2: FUZZY SETS IN THE HEART OF THE CANADIAN ROCKIES, 2004, : 524 - 527
  • [9] Clustering algorithm based on swarm intelligence for Web document
    Wu, Bin
    Fu, Wei-Peng
    Zheng, Yi
    Liu, Shao-Hui
    Shi, Zhong-Zhi
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2002, 39 (11):
  • [10] A web document clustering algorithm based on concept of neighbor
    Song, JC
    Shen, JY
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 46 - 50