A simple rapid sample-based clustering for large-scale data

被引:1
|
作者
Chen, Yewang [1 ]
Yang, Yuanyuan [1 ]
Pei, Songwen [2 ]
Chen, Yi [3 ,4 ]
Du, Jixiang [1 ]
机构
[1] Huaqiao Univ, Coll Comp Sci & Technol, Xiamen 362021, Peoples R China
[2] Univ Shanghai Sci & Technol, Shanghai Key Lab Modern Opt Syst, Shanghai, Peoples R China
[3] Beijing Technol & Business Univ, China Food Flavor & Nutr Hlth Innovat Ctr, Beijing 100048, Peoples R China
[4] Beijing Technol & Business Univ, Beijing Key Lab Big Data Technol Food Safety, Beijing 100048, Peoples R China
基金
中国国家自然科学基金;
关键词
Clustering; Sample-based clustering; Large-scale data; DBSCAN;
D O I
10.1016/j.engappai.2024.108551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale data clustering is a crucial task in addressing big data challenges. However, existing approaches often struggle to efficiently and effectively identify different types of big data, making it a significant challenge. In this paper, we propose a novel sample-based clustering algorithm, which is very simple but extremely efficient, and runs in about O ( n x r ) expected time, where n is the size of the dataset and r is the category number. The method is based on two key assumptions: (1) The data of each sufficient sample should have similar data distribution, as well as category distribution, to the entire data set; (2) the representative of each category in all sufficient samples conform to Gaussian distribution. It processes data in two stages, one is to classify data in each local sample independently, and the other is to globally classify data by assigning each point to the category of its nearest representative category center. The experimental results show that the proposed algorithm is effective, which outperforms other current variants of clustering algorithm.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Large-scale clustering of CAGE tag expression data
    Kazuro Shimokawa
    Yuko Okamura-Oho
    Takio Kurita
    Martin C Frith
    Jun Kawai
    Piero Carninci
    Yoshihide Hayashizaki
    BMC Bioinformatics, 8
  • [22] Large-Scale 802.11 Wireless Networks Data Analysis Based on Graph Clustering
    Capdehourat, German
    Bermolen, Paola
    Fiori, Marcelo
    Frevenza, Nicolas
    Larroca, Federico
    Morales, Gaston
    Rattaro, Claudina
    Zunino, Gianina
    WIRELESS PERSONAL COMMUNICATIONS, 2021, 120 (02) : 1791 - 1819
  • [23] An improved clustering method for large-scale data based on artificial immune system
    Li, Zhonghua
    Tan, Hongzhou
    Yan, Xiaoke
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13 : 920 - 924
  • [24] A Sampling-Based Density Peaks Clustering Algorithm for Large-Scale Data
    Ding, Shifei
    Li, Chao
    Xu, Xiao
    Ding, Ling
    Zhang, Jian
    Guo, Lili
    Shi, Tianhao
    PATTERN RECOGNITION, 2023, 136
  • [25] MapReduce-based Dragonfly Algorithm for large-scale Data-Clustering
    Tripathi, Ashish Kumar
    Saxena, Pranav
    Gupta, Siddharth
    2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 171 - 175
  • [26] Large-Scale 802.11 Wireless Networks Data Analysis Based on Graph Clustering
    Germán Capdehourat
    Paola Bermolen
    Marcelo Fiori
    Nicolás Frevenza
    Federico Larroca
    Gastón Morales
    Claudina Rattaro
    Gianina Zunino
    Wireless Personal Communications, 2021, 120 : 1791 - 1819
  • [27] A MapReduce-based artificial bee colony for large-scale data clustering
    Banharnsakun, Anan
    PATTERN RECOGNITION LETTERS, 2017, 93 : 78 - 84
  • [28] Large-Scale Data Clustering Algorithm Based on Quantum Immune Regulation Network
    Li, Yangyang
    Bai, Xiaoyu
    Hou, Xiaoju
    Jiao, Licheng
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017,
  • [29] Affinity propagation clustering algorithm based on large-scale data-set
    Wang L.
    Zheng K.
    Tao X.
    Han X.
    International Journal of Computers and Applications, 2018, 40 (03) : 1 - 6
  • [30] Analysis of large-scale power quality monitoring data based on quantum clustering
    Zhong, Qing
    Liang, Jiahao
    Xu, Zhong
    Meyer, Jan
    Wang, Longjun
    Wang, Gang
    ELECTRIC POWER SYSTEMS RESEARCH, 2023, 220