Document Clustering Using Incremental and Pairwise Approaches

被引:0
|
作者
Tran, Tien [1 ]
Nayak, Richi [1 ]
Bruza, Peter [1 ]
机构
[1] Queensland Univ Technol, Brisbane, Qld 4001, Australia
来源
关键词
Clustering; structure; content; XML; INEX; 2007;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents the experiments and results of a clustering approach for clustering of the large Wikipedia dataset in the INEX 2007 Document Mining Challenge. The clustering approach employed makes use of an incremental clustering method and a pairwise clustering method. The approach enables us to perform the clustering task on a large dataset by first reducing the dimension of the dataset to an undefined number of clusters using the incremental method. The lower-dimension dataset is then clustered to a required number of clusters using the pairwise method. In this way, clustering of the large number of documents is performed successfully and the accuracy of the clustering solution is achieved.
引用
收藏
页码:222 / 233
页数:12
相关论文
共 50 条
  • [31] Document clustering using sample weighting
    Zhang, Chengzhi
    Su, Xinning
    Zhou, Dongmin
    RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 260 - 265
  • [32] Multi document summarization using clustering
    Balabantaray, R.C.
    Sahoo, D.K.
    Swain, M.
    Sahoo, B.
    Journal of Theoretical and Applied Information Technology, 2012, 46 (02) : 565 - 571
  • [33] Document clustering using differential evolution
    Abraham, Ajith
    Das, Swagatam
    Konar, Amit
    2006 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-6, 2006, : 1769 - +
  • [34] Document clustering using compound words
    Wang, Y
    Hodges, J
    ICAI '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, 2005, : 307 - 313
  • [35] Refinement of document clustering by using NMF
    Shinnou, Hiroyuki
    Sasaki, Minoru
    PACLIC 21 - The 21st Pacific Asia Conference on Language, Information and Computation, Proceedings, 2007, : 430 - 439
  • [36] Beyond pairwise clustering
    Agarwal, S
    Lim, J
    Zelnik-Manor, L
    Perona, P
    Kriegman, D
    Belongie, S
    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol 2, Proceedings, 2005, : 838 - 845
  • [37] Hierarchical co-clustering: off-line and incremental approaches
    Ruggero G. Pensa
    Dino Ienco
    Rosa Meo
    Data Mining and Knowledge Discovery, 2014, 28 : 31 - 64
  • [38] Hierarchical co-clustering: off-line and incremental approaches
    Pensa, Ruggero G.
    Ienco, Dino
    Meo, Rosa
    DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (01) : 31 - 64
  • [39] Malware Variant Identification Using Incremental Clustering
    Black, Paul
    Gondal, Iqbal
    Bagirov, Adil
    Moniruzzaman, Md
    ELECTRONICS, 2021, 10 (14)
  • [40] INCREMENTAL CLUSTERING USING INFORMATION BOTTLENECK THEORY
    Liu, Yongli
    Ouyang, Yuanxin
    Xiong, Zhang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2011, 25 (05) : 695 - 712