On the quality of k-means clustering based on grouped data

被引:2
|
作者
Kaeaerik, Meelis [1 ]
Paerna, Kalev [1 ]
机构
[1] Univ Tartu, Inst Stat Math, EE-50090 Tartu, Estonia
关键词
Grouped data; k-Means; Lloyd's algorithm; Loss-function; Voronoi partitions; QUANTIZATION;
D O I
10.1016/j.jspi.2009.05.021
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Let us have a probability distribution P (possibly empirical) on the real line R. Consider the problem of finding the k-mean of P. i.e. a set A of at most k points that minimizes given loss-function. It is known that the k-mean can be found using an iterative algorithm by Lloyd [1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129-136]. However, depending on the complexity of the distribution P. the application of this algorithm can be quite resource-consuming. One possibility to overcome the problem is to group the original data and calculate the k-mean on the basis of the grouped data. As a result, the new k-mean will be biased, and our aim is to measure the loss of the quality of approximation caused by such approach. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3836 / 3841
页数:6
相关论文
共 50 条
  • [21] K-means*: Clustering by gradual data transformation
    Malinen, Mikko I.
    Mariescu-Istodor, Radu
    Franti, Pasi
    PATTERN RECOGNITION, 2014, 47 (10) : 3376 - 3386
  • [22] Data decomposition for parallel K-means clustering
    Gursoy, A
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2004, 3019 : 241 - 248
  • [23] Data clustering using K-Means based on Crow Search Algorithm
    K Lakshmi
    N Karthikeyani Visalakshi
    S Shanthi
    Sādhanā, 2018, 43
  • [24] Cleaning RFID data streams based on K-means clustering method
    Lin Qiaomin
    Fa Anqi
    Pan Min
    Xie Qiang
    Du Kun
    Sheng Michael
    TheJournalofChinaUniversitiesofPostsandTelecommunications, 2020, 27 (02) : 72 - 81
  • [25] A Missing Data Complement Method Based on K-means Clustering Analysis
    Shi, Pengjia
    Zhang, Linyao
    2017 IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2017,
  • [26] Improved k-means clustering based on Efros distance for longitudinal data
    Sun, Yanhui
    Fang, Liying
    Wang, Pu
    PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 3853 - 3856
  • [27] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [28] A fast K-Means clustering algorithm based on grid data reduction
    Li, Daqi
    Shen, Junyi
    Chen, Hongmin
    2008 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2008, : 2273 - +
  • [29] Rainfall flow optimization based K-Means clustering for medical data
    Jaya Mabel Rani, Antony
    Pravin, Albert
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (17):
  • [30] Band depth based initialization of K-means for functional data clustering
    Albert-Smet, Javier
    Torrente, Aurora
    Romo, Juan
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2023, 17 (02) : 463 - 484