On the quality of k-means clustering based on grouped data

被引：2

作者：

Kaeaerik, Meelis ^{[1
]}

Paerna, Kalev ^{[1
]}

机构：

[1] Univ Tartu, Inst Stat Math, EE-50090 Tartu, Estonia

来源：

JOURNAL OF STATISTICAL PLANNING AND INFERENCE | 2009年 / 139卷 / 11期

关键词：

Grouped data; k-Means; Lloyd's algorithm; Loss-function; Voronoi partitions; QUANTIZATION;

D O I：

10.1016/j.jspi.2009.05.021

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Let us have a probability distribution P (possibly empirical) on the real line R. Consider the problem of finding the k-mean of P. i.e. a set A of at most k points that minimizes given loss-function. It is known that the k-mean can be found using an iterative algorithm by Lloyd [1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129-136]. However, depending on the complexity of the distribution P. the application of this algorithm can be quite resource-consuming. One possibility to overcome the problem is to group the original data and calculate the k-mean on the basis of the grouped data. As a result, the new k-mean will be biased, and our aim is to measure the loss of the quality of approximation caused by such approach. (C) 2009 Elsevier B.V. All rights reserved.

引用

页码：3836 / 3841

页数：6

共 50 条

[21] K-means*: Clustering by gradual data transformation
Malinen, Mikko I.
Mariescu-Istodor, Radu
Franti, Pasi
PATTERN RECOGNITION, 2014, 47 (10) : 3376 - 3386
[22] Data decomposition for parallel K-means clustering
Gursoy, A
PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2004, 3019 : 241 - 248
[23] Data clustering using K-Means based on Crow Search Algorithm
K Lakshmi
N Karthikeyani Visalakshi
S Shanthi
Sādhanā, 2018, 43
[24] Cleaning RFID data streams based on K-means clustering method
Lin Qiaomin
Fa Anqi
Pan Min
Xie Qiang
Du Kun
Sheng Michael
TheJournalofChinaUniversitiesofPostsandTelecommunications, 2020, 27 (02) : 72 - 81
[25] A Missing Data Complement Method Based on K-means Clustering Analysis
Shi, Pengjia
Zhang, Linyao
2017 IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2017,
[26] Improved k-means clustering based on Efros distance for longitudinal data
Sun, Yanhui
Fang, Liying
Wang, Pu
PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 3853 - 3856
[27] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
Shi Na
Liu Xumin
Guan Yong
2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
[28] A fast K-Means clustering algorithm based on grid data reduction
Li, Daqi
Shen, Junyi
Chen, Hongmin
2008 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2008, : 2273 - +
[29] Rainfall flow optimization based K-Means clustering for medical data
Jaya Mabel Rani, Antony
Pravin, Albert
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (17):
[30] Band depth based initialization of K-means for functional data clustering
Albert-Smet, Javier
Torrente, Aurora
Romo, Juan
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2023, 17 (02) : 463 - 484

← 1 2 3 4 5 →