A parallel hybrid web document clustering algorithm and its performance study

被引：16

作者：

Xu, ST ^{[1
]}

Zhang, J ^{[1
]}

机构：

[1] Univ Kentucky, Dept Comp Sci, Lab High Performance Sci Comp & Comp Simulat, Lexington, KY 40506 USA

来源：

JOURNAL OF SUPERCOMPUTING | 2004年 / 30卷 / 02期

关键词：

information retrieval; parallel document clustering; PDDP; K-means;

D O I：

10.1023/B:SUPE.0000040611.25862.d9

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Clustering web document is an important procedure in many web information retrieval systems. As the size of the Internet grows rapidly and the amount of information requests increases exponentially, the use of parallel computing techniques in large scale web document retrieval is unavoidable. We propose a parallel hybrid web document clustering algorithm, which combines the Principal Direction Divisive Partitioning (PDDP) algorithm with the K-means algorithm. Computational experiments were conducted to test the performance of the hybrid algorithm using three real life web document datasets, and the results were compared with that of the parallel PDDP algorithm and the parallel K-means algorithm. The experiments show that the quality of the clustering solutions obtained from the hybrid algorithm is better than that from the parallel PDDP or the parallel K-means. The parallel run time of the hybrid algorithm is similar to and sometimes less than that of the widely used K-means algorithm.

引用

页码：117 / 131

页数：15

共 50 条

[1] A Parallel Hybrid Web Document Clustering Algorithm and its Performance Study
Shuting Xu
Jun Zhang
The Journal of Supercomputing, 2004, 30 : 117 - 131
[2] An improved clustering algorithm for web document
Wang, Jing
Liu, Zhijing
Journal of Information and Computational Science, 2009, 6 (02): : 959 - 966
[3] A parallel clustering algorithm on the star graph and its performance
Sarbazi-Azad, Hamid
Zarandi, Hamid R.
Fazeli, Mahdi
MATHEMATICAL AND COMPUTER MODELLING, 2013, 58 (3-4) : 880 - 891
[4] Hybrid Neural Network Model for Web Document Clustering
Hemalatha, M.
Srinivas, Sathya D.
2009 SECOND INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009), 2009, : 531 - +
[5] Web document clustering using a hybrid neural network
Khan, MS
Khor, SW
APPLIED SOFT COMPUTING, 2004, 4 (04) : 423 - 432
[6] A parallel text document clustering algorithm based on neighbors
Li, Yanjun
Luo, Congnan
Chung, Soon M.
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (02): : 933 - 948
[7] A parallel text document clustering algorithm based on neighbors
Yanjun Li
Congnan Luo
Soon M. Chung
Cluster Computing, 2015, 18 : 933 - 948
[8] A fuzzy-based algorithm for Web document clustering
Friedman, M
Kandel, A
Schneider, M
Last, M
Shapira, B
Elovici, Y
Zaafrany, O
NAFIPS 2004: ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, VOLS 1AND 2: FUZZY SETS IN THE HEART OF THE CANADIAN ROCKIES, 2004, : 524 - 527
[9] Clustering algorithm based on swarm intelligence for Web document
Wu, Bin
Fu, Wei-Peng
Zheng, Yi
Liu, Shao-Hui
Shi, Zhong-Zhi
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2002, 39 (11):
[10] A web document clustering algorithm based on concept of neighbor
Song, JC
Shen, JY
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 46 - 50

← 1 2 3 4 5 →