Density-based multiscale data condensation

被引:86
|
作者
Mitra, P [1 ]
Murthy, CA [1 ]
Pal, SK [1 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Kolkata 700035, W Bengal, India
关键词
data mining; multiscale condensation; scalability; density estimation; convergence in probability; instance learning;
D O I
10.1109/TPAMI.2002.1008381
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects representative points in a multiscale fashion which is novel from existing density-based approaches, The accuracy of representation by the condensed set is measured in terms of the error in density estimates of the original and reduced sets. Experimental studies on several real life data sets show that the multiscale approach is superior to several related condensation methods both in terms of condensation ratio and estimation error. The condensed set obtained was also experimentally shown to be effective for some important data mining tasks like classification, clustering, and rule generation on large data sets. Moreover, it is empirically found that the algorithm is efficient in terms of sample complexity.
引用
收藏
页码:734 / 747
页数:14
相关论文
共 50 条
  • [31] On Density-Based Data Streams Clustering Algorithms: A Survey
    Amineh Amini
    Teh Ying Wah
    Hadi Saboohi
    Journal of Computer Science and Technology, 2014, 29 : 116 - 141
  • [32] Hierarchical density-based clustering of categorical data and a simplification
    Andreopoulos, Bill
    An, Aijun
    Wang, Xiaogang
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 11 - +
  • [33] Effective Density-Based Clustering Algorithms for Incomplete Data
    Zhonghao Xue
    Hongzhi Wang
    Big Data Mining and Analytics, 2021, 4 (03) : 183 - 194
  • [34] Density-based clustering for bivariate-flow data
    Shu, Hua
    Pei, Tao
    Song, Ci
    Chen, Jie
    Chen, Xiao
    Guo, Sihui
    Liu, Yaxi
    Wang, Xi
    Wang, Xuyang
    Zhou, Chenghu
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2022, 36 (09) : 1809 - 1829
  • [35] Density-based clustering for evolving uncertain data stream
    He, Haitao
    Zhao, Jintian
    Journal of Computational Information Systems, 2014, 10 (01): : 419 - 426
  • [36] On Density-Based Data Streams Clustering Algorithms: A Survey
    Amineh Amini
    Teh Ying Wah
    Hadi Saboohi
    Journal of Computer Science & Technology, 2014, 29 (01) : 116 - 141
  • [37] Density-based clustering for road accident data analysis
    Alotaibi, Abdullah S.
    INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2018, 5 (08): : 113 - 121
  • [38] On Density-Based Data Streams Clustering Algorithms: A Survey
    Amini, Amineh
    Teh, Ying Wah
    Saboohi, Hadi
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2014, 29 (01) : 116 - 141
  • [39] Density-based clustering algorithm for mixture data sets
    Huang, De-Cai
    Wu, Tian-Hong
    Kongzhi yu Juece/Control and Decision, 2010, 25 (03): : 416 - 421
  • [40] A Density-Based Random Forest for Imbalanced Data Classification
    Dong, Jia
    Qian, Quan
    FUTURE INTERNET, 2022, 14 (03):