Empowering graph segmentation methods with SOMs and CONN similarity for clustering large and complex data

被引:0
|
作者
Erzsébet Merényi
Joshua Taylor
机构
[1] Rice University,Department of Statistics
[2] Rice University,Department of Electrical and Computer Engineering
来源
Neural Computing and Applications | 2020年 / 32卷
关键词
SOM clustering; Graph segmentation; CONN similarity; Big Data; Automation;
D O I
暂无
中图分类号
学科分类号
摘要
High-dimensional, large, and noisy data with complex structure challenge the limits of many clustering algorithms including modern graph segmentation methods. SOM-based clustering has been shown capable of capturing many clusters of widely varying statistical properties in such data. However, to date the best discovery results are produced by interactive extraction of clusters from informative SOM visualizations. This does not scale for Big Data, large archives, or near-real-time analyses. We approach this challenge by infusing SOM knowledge into leading automatic graph segmentation algorithms, which produce extremely poor results when segmenting the SOM prototypes without this information, and which would take a prohibitively long time to segment the input data sets. The knowledge translation occurs by casting the SOM prototypes as vertices and the CONN similarity measure as edge weightings of a graph which is then presented to graph segmentation algorithms. The resulting performance closely approximates the precision of the interactive SOM segmentation for complicated data and, at the same time, is extremely fast and memory-efficient. We demonstrate the effectiveness on a simple synthetic data set and on a very realistic fully labeled synthetic hyperspectral image. We also examine performance dependence on available parametrizations of the graph segmentation algorithms, in combination with parametrizations of the CONN similarity measure.
引用
收藏
页码:18161 / 18178
页数:17
相关论文
共 50 条
  • [11] Multiview Data Clustering with Similarity Graph Learning Guided Unsupervised Feature Selection
    Li, Ni
    Peng, Manman
    Wu, Qiang
    ENTROPY, 2023, 25 (12)
  • [12] A Novel Clustering Algorithm on Large-Scale Graph Data
    Zhang, Hao
    Zhou, Wei
    Wan, Xiaoyu
    Fu, Ge
    Xu, Zhiyong
    Han, Jizhong
    2014 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2014, : 47 - 54
  • [13] Synthetic Test Data Generation for Hierarchical Graph Clustering Methods
    Szilagyi, Laszlo
    Kovacs, Levente
    Szilagyi, Sandor Miklos
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 303 - 310
  • [14] A Survey on Data Mining Methods for Clustering Complex Spatiotemporal Data
    Maciag, Piotr S.
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES: TOWARDS EFFICIENT SOLUTIONS FOR DATA ANALYSIS AND KNOWLEDGE REPRESENTATION, 2017, 716 : 115 - 126
  • [15] Large margin clustering on uncertain data by considering probability distribution similarity
    Xu, Lei
    Hu, Qinghua
    Hung, Edward
    Chen, Baowen
    Tan, Xu
    Liao, Changrui
    NEUROCOMPUTING, 2015, 158 : 81 - 89
  • [16] Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size
    Shinnou, Hiroyuki
    Sasaki, Minoru
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 201 - 204
  • [17] Large Scale Data Clustering and Graph Partitioning via Simulated Mixing
    Bhatti, Shahzad
    Beck, Carolyn
    Nedic, Angelia
    2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 147 - 152
  • [18] Automated Clustering of Large Data Sets Based on a Topology Representing Graph
    Tasdemir, Kadim
    2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 105 - 108
  • [19] Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures
    Raymond, JW
    Blankley, CJ
    Willett, P
    JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2003, 21 (05): : 421 - 433
  • [20] Effective similarity search methods for large video data streams
    Lee, SL
    Chun, SJ
    Lee, JH
    COMPUTATIONAL SCIENCE - ICCS 2003, PT IV, PROCEEDINGS, 2003, 2660 : 1030 - 1039