Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data

被引:4
|
作者
Umbleja, Kadri [1 ]
Ichino, Manabu [1 ]
Yaguchi, Hiroyuki [1 ]
机构
[1] Tokyo Denki Univ, Saitama, Japan
基金
日本学术振兴会;
关键词
Conceptual clustering; Quantile method; Symbolic data;
D O I
10.1007/s11634-020-00411-w
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Symbolic data is aggregated from bigger traditional datasets in order to hide entry specific details and to enable analysing large amounts of data, like big data, which would otherwise not be possible. Symbolic data may appear in many different but complex forms like intervals and histograms. Identifying patterns and finding similarities between objects is one of the most fundamental tasks of data mining. In order to accurately cluster these sophisticated data types, usual methods are not enough. Throughout the years different approaches have been proposed but they mainly concentrate on the "macroscopic" similarities between objects. Distributional data, for example symbolic data, has been aggregated from sets of large data and thus even the smallest microscopic differences and similarities become extremely important. In this paper a method is proposed for clustering distributional data based on these microscopic similarities by using quantile values. Having multiple points for comparison enables to identify similarities in small sections of distribution while producing more adequate hierarchical concepts. Proposed algorithm, called microscopic hierarchical conceptual clustering, has a monotone property and has been found to produce more adequate conceptual clusters during experimentation. Furthermore, thanks to the usage of quantiles, this algorithm allows us to compare different types of symbolic data easily without any additional complexity.
引用
收藏
页码:407 / 436
页数:30
相关论文
共 50 条
  • [41] clusterMLD: An Efficient Hierarchical Clustering Method for Multivariate Longitudinal Data
    Zhou, Junyi
    Zhang, Ying
    Tu, Wanzhu
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (03) : 1131 - 1144
  • [42] Fast Hierarchical Clustering Based on Compressed Data and OPTICS
    Breunig, Markus M.
    Kriegel, Hans-Peter
    Sander, Joerg
    LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 232 - 242
  • [43] Density-based hierarchical clustering for streaming data
    Tu, Q.
    Lu, J. F.
    Yuan, B.
    Tang, J. B.
    Yang, J. Y.
    PATTERN RECOGNITION LETTERS, 2012, 33 (05) : 641 - 645
  • [44] Model-Based Hierarchical Clustering for Categorical Data
    Alalyan, Fahdah
    Zamzami, Nuha
    Bouguila, Nizar
    2019 IEEE 28TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2019, : 1424 - 1429
  • [45] A Hierarchical Clustering Method For Big Data Oriented Ciphertext Search
    Chen, Chi
    Zhu, Xiaojie
    Shen, Peisong
    Hu, Jiankun
    2014 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2014, : 559 - 564
  • [46] A novel incremental conceptual hierarchical text clustering method using CFu-tree
    Peng, Tao
    Liu, Lu
    APPLIED SOFT COMPUTING, 2015, 27 : 269 - 278
  • [47] WLAN Floor Location Method Based on Hierarchical Clustering
    Zhong, Weifeng
    Jie Yu
    2015 3RD INTERNATIONAL CONFERENCE ON COMPUTER AND COMPUTING SCIENCE (COMCOMS), 2016, : 41 - 44
  • [48] Mining VIP based on an improved hierarchical clustering method
    Nie, Bin
    Du, Jianqiang
    Liu, Hongnin
    Xu, Guoliang
    Wang, Zhuo
    Zhu, Mingfeng
    Zhang, Qiyun
    2009 SECOND INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING: KAM 2009, VOL 3, 2009, : 323 - +
  • [49] A Static Video Summarization Method Based on Hierarchical Clustering
    Guimaraes, Silvio Jamil F.
    Gomes, Willer
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, 2010, 6419 : 46 - 54
  • [50] Research on Statistical Method for Patent Based on Hierarchical Clustering
    Huang Lucheng
    Cai Shuang
    RECENT ADVANCE IN STATISTICS APPLICATION AND RELATED AREAS, PTS 1 AND 2, 2008, : 1142 - 1147