Graph clustering-based discretization of splitting and merging methods (GraphS and GraphM)

被引:20
|
作者
Sriwanna, Kittakorn [1 ]
Boongoen, Tossapon [1 ]
Iam-On, Natthakan [1 ]
机构
[1] Mae Fah Luang Univ, Sch Informat Technol, Phahon Yothin Rd, Muang 57100, Chiang Rai, Thailand
关键词
Multivariate discretization; Graph clustering; Normalized cuts; Normalized association; Data mining; ALGORITHM; TESTS;
D O I
10.1186/s13673-017-0103-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Discretization plays a major role as a data preprocessing technique used in machine learning and data mining. Recent studies have focused on multivariate discretization that considers relations among attributes. The general goal of this method is to obtain the discrete data, which preserves most of the semantics exhibited by original continuous data. However, many techniques generate the final discrete data that may be less useful with natural groups of data not being maintained. This paper presents a novel graph clustering-based discretization algorithm that encodes different similarity measures into a graph representation of the examined data. The intuition allows more refined data-wise relations to be obtained and used with the effective graph clustering technique based on normalized association to discover nature graphs accurately. The goodness of this approach is empirically demonstrated over 30 standard datasets and 20 imbalanced datasets, compared with 11 well-known discretization algorithms using 4 classifiers. The results suggest the new approach is able to preserve the natural groups and usually achieve the efficiency in terms of classifier performance, and the desired number of intervals than the comparative methods.
引用
收藏
页数:39
相关论文
共 50 条
  • [31] Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search
    Munoz, Javier Vargas
    Goncalves, Marcos A.
    Dias, Zanoni
    Torres, Ricardo da S.
    PATTERN RECOGNITION, 2019, 96
  • [32] Spectral clustering-based community detection using graph distance and node attributes
    Tang, Fengqin
    Wang, Chunning
    Su, Jinxia
    Wang, Yuanyuan
    COMPUTATIONAL STATISTICS, 2020, 35 (01) : 69 - 94
  • [33] An Enhanced Motif Graph Clustering-Based Deep Learning Approach for Traffic Forecasting
    Zhang, Chenhan
    Zhang, Shuyu
    Yu, James J. Q.
    Yu, Shui
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [34] A Clustering-Based Graph Laplacian Framework for Value Function Approximation in Reinforcement Learning
    Xu, Xin
    Huang, Zhenhua
    Graves, Daniel
    Pedrycz, Witold
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (12) : 2613 - 2625
  • [35] Spectral clustering-based community detection using graph distance and node attributes
    Fengqin Tang
    Chunning Wang
    Jinxia Su
    Yuanyuan Wang
    Computational Statistics, 2020, 35 : 69 - 94
  • [36] A Graph and Trace Clustering-based Approach for Abstracting Mined Business Process Models
    Sun, Yaguang
    Bauer, Bernhard
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1 (ICEIS), 2016, : 63 - 74
  • [37] AUTOMATIC SUB-IMAGES EXTRACTION FROM ENTIRE URBAN SAR SCENES BASED ON THE CLUSTERING-BASED ALGORITHM AND GRAPH TRAVERSAL METHODS
    Li, Jie
    Cheng, Ran
    Gao, Yesheng
    Jiang, Xue
    Yuan, Bin
    Zhang, Ye
    Liu, Xingzhao
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 8602 - 8605
  • [38] Correlation Clustering-based Multi-robot Task Allocation: A Tale of Two Graphs
    Dutta, Ayan
    Czarnecki, Emily
    Ufimtsev, Vladimir
    Asaithambi, Asai
    APPLIED COMPUTING REVIEW, 2019, 19 (04): : 5 - 16
  • [39] Clustering-based Algorithms to Semantic Summarizing Graph with Multi-attributes' Hierarchical Structures
    Sun, Chong
    Chen, WenYing
    Hu, YiRan
    Tie, Jun
    Cai, Xiantao
    2016 IEEE 13TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE), 2016, : 64 - 70
  • [40] Clustering-based force-directed algorithms for 3D graph visualization
    Lu, Jiawei
    Si, Yain-Whar
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (12): : 9654 - 9715