Web document clustering based on Global-Best Harmony Search, K-means, Frequent Term Sets and Bayesian Information Criterion

被引:0
|
作者
Cobos, Carlos [1 ]
Andrade, Jennifer [1 ]
Constain, William [1 ]
Mendoza, Martha [1 ]
Leon, Elizabeth [2 ]
机构
[1] Univ Cauca, Popayan, Colombia
[2] Univ Nacl Colombia, Bogota, Colombia
关键词
ALGORITHM; OPTIMIZATION; LINGO;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper introduces a new description-centric algorithm for web document clustering based on the hybridization of the Global-Best Harmony Search with the K-means algorithm, Frequent Term Sets and Bayesian Information Criterion. The new algorithm defines the number of clusters automatically. The Global-Best Harmony Search provides a global strategy for a search in the solution space, based on the Harmony Search and the concept of swarm intelligence. The K-means algorithm is used to find the optimum value in a local search space. Bayesian Information Criterion is used as a fitness function, while FP-Growth is used to reduce the high dimensionality in the vocabulary. This resulting algorithm, called IGBHSK, was tested with data sets based on Reuters-21578 and DMOZ, obtaining promising results (better precision results than a Singular Value Decomposition algorithm). Also, it was also then evaluated by a group of users.
引用
收藏
页数:8
相关论文
共 14 条
  • [1] A Hybrid Algorithm for Web Document Clustering Based on Frequent Term Sets and k-Means
    Wang, Le
    Tian, Li
    Jia, Yan
    Han, Weihong
    ADVANCES IN WEB AND NETWORK TECHNOLOGIES, AND INFORMATION MANAGEMENT, PROCEEDINGS, 2007, 4537 : 198 - 203
  • [2] Extractive Single-Document Summarization Based on Global-Best Harmony Search and a Greedy Local Optimizer
    Mendoza, Martha
    Cobos, Carlos
    Leon, Elizabeth
    ADVANCES IN ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS, MICAI 2015, PT II, 2015, 9414 : 52 - 66
  • [3] Web Document Clustering based on a New Niching Memetic Algorithm, Term-Document Matrix and Bayesian Information Criterion
    Cobos, Carlos
    Montealegre, Claudia
    Mejia, Maria-Fernanda
    Mendoza, Martha
    Leon, Elizabeth
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [4] K-means algorithm based on particle swarm optimization for web document clustering
    Xiao, L. Z.
    Shao, Z. Q.
    Gu, X. M.
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 980 - 984
  • [5] A deflation-adjusted Bayesian information criterion for selecting the number of clusters in K-means clustering
    Ueki, Masao
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2025, 209
  • [6] Clustering of Web Search Results based on an Iterative Fuzzy C-means Algorithm and Bayesian Information Criterion
    Cobos, Carlos
    Mendoza, Martha
    Leon, Elizabeth
    Manic, Milos
    Herrera-Viedma, Enrique
    PROCEEDINGS OF THE 2013 JOINT IFSA WORLD CONGRESS AND NAFIPS ANNUAL MEETING (IFSA/NAFIPS), 2013, : 507 - 512
  • [7] An efficient document clustering using hybridised harmony search K-means algorithm with multi-view point
    Siamala Devi S.
    Anto S.
    Siddique Ibrahim S.P.
    Siamala Devi, S. (siamalamagesh@gmail.com), 1600, Inderscience Publishers (10): : 129 - 143
  • [8] Clustering of web search results based on the cuckoo search algorithm and Balanced Bayesian Information Criterion
    Cobos, Carlos
    Munoz-Collazos, Henry
    Urbano-Munoz, Richar
    Mendoza, Martha
    Leon, Elizabeth
    Herrera-Viedma, Enrique
    INFORMATION SCIENCES, 2014, 281 : 248 - 264
  • [9] Fast global k-means clustering based on local geometrical information
    Bai, Liang
    Liang, Jiye
    Sui, Chao
    Dang, Chuangyin
    INFORMATION SCIENCES, 2013, 245 : 168 - 180
  • [10] Automatic Generation of Multi-document Summaries Based on the Global-Best Harmony Search Metaheuristic and the LexRank Graph-Based Algorithm
    Cuellar, Cesar
    Mendoza, Martha
    Cobos, Carlos
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2017, PT II, 2018, 10633 : 82 - 94