A flocking based algorithm for document clustering analysis

被引:54
|
作者
Cui, Xiaohui [1 ]
Gao, Jinzhu [1 ]
Potok, Thomas E. [1 ]
机构
[1] Oak Ridge Natl Lab, Oak Ridge, TN 37831 USA
关键词
document clustering; bio-inspired; agent; flocking model; F-measure;
D O I
10.1016/j.sysarc.2006.02.003
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K-means and the Ant clustering algorithm for real document clustering. (C) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:505 / 515
页数:11
相关论文
共 50 条
  • [21] A Novel Hybrid Clustering Approach Based on Black Hole Algorithm for Document Clustering
    Malik, Fazila
    Khan, Salabat
    Rizwan, Atif
    Atteia, Ghada
    Samee, Nagwan Abdel
    IEEE ACCESS, 2022, 10 : 97310 - 97326
  • [22] A Novel Hybrid Clustering Approach Based on Black Hole Algorithm for Document Clustering
    Malik, Fazila
    Khan, Salabat
    Rizwan, Atif
    Atteia, Ghada
    Samee, Nagwan Abdel
    IEEE Access, 2022, 10 : 97310 - 97326
  • [23] A co-clustering algorithm based on structured Web document
    Deng, Dong-Mei
    Long, Ji-Zhen
    Yin, Xiang-Zhou
    Zhongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Central South University (Science and Technology), 2010, 41 (05): : 1871 - 1876
  • [24] An incremental document clustering algorithm based on a hierarchical agglomerative approach
    Joo, KH
    Lee, SJ
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2005, 3816 : 321 - 332
  • [25] An effective web document clustering algorithm based on bisection and merge
    Ingyu Lee
    Byung-Won On
    Artificial Intelligence Review, 2011, 36 : 69 - 85
  • [26] Concept based document clustering using K prototype Algorithm
    Pasarate, Sneha
    Shedge, Rajashree
    2018 INTERNATIONAL CONFERENCE ON CONTROL, POWER, COMMUNICATION AND COMPUTING TECHNOLOGIES (ICCPCCT), 2018, : 579 - 583
  • [27] An effective web document clustering algorithm based on bisection and merge
    Lee, Ingyu
    On, Byung-Won
    ARTIFICIAL INTELLIGENCE REVIEW, 2011, 36 (01) : 69 - 85
  • [28] Fuzzy Ontology for Distributed Document Clustering based on Genetic Algorithm
    Thangamani, M.
    Thangaraj, P.
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (04): : 1563 - 1574
  • [29] A semi-supervised document clustering algorithm based on EM
    Rigutini, L
    Maggini, M
    2005 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2005, : 200 - 206
  • [30] Analysis of multipoint activity in the mouse brain based on flocking algorithm
    Zaleshina, Margarita
    Zaleshin, Alexander
    JOURNAL OF COMPUTATIONAL NEUROSCIENCE, 2023, 51 : S99 - S100