An Integrated Cluster Analysis and Validity Test Platform for the Compression-based Clustering Approach

被引:0
|
作者
Cernian, Alexandra [1 ]
Carstoiu, Dorin [1 ]
Olteanu, Adriana [1 ]
Sgarciu, Valentin [1 ]
机构
[1] Univ Politehn Bucuresti, Bucharest, Romania
来源
STUDIES IN INFORMATICS AND CONTROL | 2015年 / 24卷 / 02期
关键词
clustering; compression; cluster analysis; FScore; expert system;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper focuses on the compression based clustering and aims to determine the most suitable combinations of algorithms for different clustering contexts (text, heterogeneous data, Web pages, metadata and so on) and establish whether using compression with traditional clustering methods leads to better performance. In this context, we propose an integrated cluster analysis test platform, called EasyClustering, which incorporates two subsystems: a clustering component and a cluster validity expert system, which automatically determines the quality of a clustering solution by computing the FScore value. The experimental results are focused on two main directions: determining the best approach for compression based clustering in terms of context, compression algorithms and clustering algorithms, and validating the functionality of the cluster analysis expert system for determining the quality of the clustering solutions. After conducting a set of 324 clustering tests, we concluded that compressing the input when using traditional clustering methods increases the quality of the clustering solutions, leading to results comparable to the NCD and the cluster analysis expert system proved 100% its accuracy so far, so we estimate that, even if some slight deviation should occur, it will be minimal.
引用
收藏
页码:151 / 158
页数:8
相关论文
共 50 条
  • [1] Compression-based hierarchical clustering of SAR images
    Cerra, Daniele
    Datcu, Mihai
    REMOTE SENSING LETTERS, 2010, 1 (03) : 141 - 147
  • [2] Influence of music representation on compression-based clustering
    Gonzalez-Pardo, Antonio
    Granados, Ana
    Camacho, David
    de Borja Rodrigues, Francisco
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [3] Compression-based SoC Test Infrastructures
    Dalmasso, Julien
    Flottes, Marie-Lise
    Rouzeyre, Bruno
    VLSI-SOC: ADVANCED TOPICS ON SYSTEMS ON A CHIP, 2009, 291 : 53 - 67
  • [4] Relevance of Contextual Information in Compression-Based Text Clustering
    Granados, Ana
    Martinez, Rafael
    Camacho, David
    de Borja Rodriguez, Francisco
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2010, 2010, 6283 : 259 - 266
  • [5] Compact Test Set Generation for Test Compression-based Designs
    Eggersgluess, Stephan
    2015 20TH IEEE EUROPEAN TEST SYMPOSIUM (ETS), 2015,
  • [6] Compression-based analysis of metamorphic malware
    Department of Computer Science, San Jose State University, San Jose
    CA
    95192, United States
    Int. J. Secur. Netw., 2 (124-136):
  • [7] A Compression-Based Method for Stemmatic Analysis
    Roos, Teemu
    Heikkila, Tuomas
    Myllymaki, Petri
    ECAI 2006, PROCEEDINGS, 2006, 141 : 805 - +
  • [8] A Compression-Based Dissimilarity Measure for Multi-task Clustering
    Nguyen Huy Thach
    Shao, Hao
    Tong, Bin
    Suzuki, Einoshin
    FOUNDATIONS OF INTELLIGENT SYSTEMS, 2011, 6804 : 123 - 132
  • [9] Clustering Heterogeneous Web Data Using Clustering by Compression. Cluster Validity
    Cernian, Alexandra
    Carstoiu, Dorin
    Olteanu, Adriana
    PROCEEDINGS OF THE 10TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, 2009, : 123 - 126
  • [10] Compression-Based Clustering of Video Human Activity Using an ASCII Encoding
    Sarasa, Guillermo
    Montero, Aaron
    Granados, Ana
    Rodriguez, Francisco B.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT II, 2018, 11140 : 66 - 75