MDCStream: Stream Data Generator for Testing Analysis Algorithms

被引:3
|
作者
Iglesias, Felix [1 ]
Ojdanic, Denis [1 ]
Hartl, Alexander [1 ]
Zseby, Tanja [1 ]
机构
[1] TU Wien, Inst Telecommun, Vienna, Austria
关键词
data generation; stream data; synthetic data; multi-dimensional data; concept drift; nonstationarity; MATLAB;
D O I
10.1145/3388831.3388832
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The establishment of modern technological paradigms like ubiquitous computing, big data, cyber-physical systems, or communication networks has strongly increased the need for efficient, effective data stream analysis. MDCStream is a MATLAB tool for generating temporal-dependent numerical datasets in order to stress-test stream data classification, clustering, and outlier detection algorithms. MDCStream is built on MDCGen, therefore showing a high flexibility for creating a wide diversity of data scenarios. To show an example of the potential of MDCStream, we tested a stream data clustering algorithm recently proposed in the literature with datasets generated with MDCStream. Datasets were designed to draw challenges related to space geometries and concept drift.
引用
收藏
页码:56 / 63
页数:8
相关论文
共 50 条
  • [1] Data Stream Algorithms for Codeword Testing
    Rudra, Atri
    Uurtamo, Steve
    AUTOMATA, LANGUAGES AND PROGRAMMING, PT I, 2010, 6198 : 629 - 640
  • [2] SYNTHETIC DATA GENERATOR FOR TESTING OF CLASSIFICATION RULE ALGORITHMS
    Seidlova, R.
    Pozivil, J.
    Seidl, J.
    Malecl, L.
    NEURAL NETWORK WORLD, 2017, 27 (02) : 215 - 229
  • [3] Using a Hybrid Data Generator for Testing of ABF-Algorithms
    Nagel, Dieter
    Smith, Stephen
    2013 WORKSHOP ON SENSOR DATA FUSION: TRENDS, SOLUTIONS, APPLICATIONS (SDF), 2013,
  • [4] PMU Data Stream Generator for Testing of Transmission Properties in PMU Virtual Networks
    Repka, Pavel
    Bilik, Petr
    PROCEEDINGS OF THE 13TH INTERNATIONAL SCIENTIFIC CONFERENCE ELECTRIC POWER ENGINEERING 2012, VOLS 1 AND 2, 2012, : 373 - 378
  • [5] Creating a Data Generator and Implementing Algorithms in Process Analysis
    Bakir, Cigdem
    Yuzkat, Mecit
    Karabiber, Fatih
    ELEKTRONIKA IR ELEKTROTECHNIKA, 2022, 28 (05) : 68 - 79
  • [6] SEED-G: Simulated EEG Data Generator for Testing Connectivity Algorithms
    Anzolin, Alessandra
    Toppi, Jlenia
    Petti, Manuela
    Cincotti, Febo
    Astolfi, Laura
    SENSORS, 2021, 21 (11)
  • [7] Time series big data: a survey on data stream frameworks, analysis and algorithms
    Almeida, Ana
    Bras, Susana
    Sargento, Susana
    Pinto, Filipe Cabral
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [8] Time series big data: a survey on data stream frameworks, analysis and algorithms
    Ana Almeida
    Susana Brás
    Susana Sargento
    Filipe Cabral Pinto
    Journal of Big Data, 10
  • [9] Clustering data stream: A survey of algorithms
    Mahdiraji, Alireza
    INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2009, 13 (02) : 39 - 44
  • [10] An evaluation of data stream clustering algorithms
    Mansalis, Stratos
    Ntoutsi, Eirini
    Pelekis, Nikos
    Theodoridis, Yannis
    STATISTICAL ANALYSIS AND DATA MINING, 2018, 11 (04) : 167 - 187