A parallel hybrid web document clustering algorithm and its performance study

被引:16
|
作者
Xu, ST [1 ]
Zhang, J [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lab High Performance Sci Comp & Comp Simulat, Lexington, KY 40506 USA
来源
JOURNAL OF SUPERCOMPUTING | 2004年 / 30卷 / 02期
关键词
information retrieval; parallel document clustering; PDDP; K-means;
D O I
10.1023/B:SUPE.0000040611.25862.d9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering web document is an important procedure in many web information retrieval systems. As the size of the Internet grows rapidly and the amount of information requests increases exponentially, the use of parallel computing techniques in large scale web document retrieval is unavoidable. We propose a parallel hybrid web document clustering algorithm, which combines the Principal Direction Divisive Partitioning (PDDP) algorithm with the K-means algorithm. Computational experiments were conducted to test the performance of the hybrid algorithm using three real life web document datasets, and the results were compared with that of the parallel PDDP algorithm and the parallel K-means algorithm. The experiments show that the quality of the clustering solutions obtained from the hybrid algorithm is better than that from the parallel PDDP or the parallel K-means. The parallel run time of the hybrid algorithm is similar to and sometimes less than that of the widely used K-means algorithm.
引用
收藏
页码:117 / 131
页数:15
相关论文
共 50 条
  • [41] Hybrid optimization algorithm and its performance
    Ye, Yu-Ling
    San, Ye
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2009, 39 (01): : 131 - 136
  • [42] Application of fuzzy clustering algorithm in Chinese document clustering
    Li, Jiafu
    Zhang, Yafei
    Lu, Jianjiang
    Jisuanji Gongcheng/Computer Engineering, 2002, 28 (04):
  • [43] A New Hybrid Approach for Document Clustering
    Ismael, Osama
    2017 13TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2017, : 291 - 296
  • [44] Study and application on hybrid parallel FDTD algorithm
    Liu, Yu
    Liang, Zheng
    Yang, Zi-Qiang
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2009, 38 (02): : 222 - 226
  • [45] Web Document Clustering based on a New Niching Memetic Algorithm, Term-Document Matrix and Bayesian Information Criterion
    Cobos, Carlos
    Montealegre, Claudia
    Mejia, Maria-Fernanda
    Mendoza, Martha
    Leon, Elizabeth
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [46] Web search result refinement by document clustering
    Tsui, Ming Hei
    Lim, Bresley
    Shi, Daming
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 2224 - 2229
  • [47] A probabilistic relational approach for web document clustering
    Fersini, E.
    Messina, E.
    Archetti, F.
    INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (02) : 117 - 130
  • [48] Web document clustering using hyperlink structures
    He, X
    Zha, HY
    Ding, CHQ
    Simon, HD
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 41 (01) : 19 - 45
  • [49] Digital Web Library of a Website with Document Clustering
    Mahecha-Nieto, Isabel
    Leon, Elizabeth
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 214 - 223
  • [50] Unsupervised clustering for nontextual web document classification
    Chan, SWK
    Chong, MWC
    DECISION SUPPORT SYSTEMS, 2004, 37 (03) : 377 - 396