MDCStream: Stream Data Generator for Testing Analysis Algorithms

被引:3
|
作者
Iglesias, Felix [1 ]
Ojdanic, Denis [1 ]
Hartl, Alexander [1 ]
Zseby, Tanja [1 ]
机构
[1] TU Wien, Inst Telecommun, Vienna, Austria
关键词
data generation; stream data; synthetic data; multi-dimensional data; concept drift; nonstationarity; MATLAB;
D O I
10.1145/3388831.3388832
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The establishment of modern technological paradigms like ubiquitous computing, big data, cyber-physical systems, or communication networks has strongly increased the need for efficient, effective data stream analysis. MDCStream is a MATLAB tool for generating temporal-dependent numerical datasets in order to stress-test stream data classification, clustering, and outlier detection algorithms. MDCStream is built on MDCGen, therefore showing a high flexibility for creating a wide diversity of data scenarios. To show an example of the potential of MDCStream, we tested a stream data clustering algorithm recently proposed in the literature with datasets generated with MDCStream. Datasets were designed to draw challenges related to space geometries and concept drift.
引用
收藏
页码:56 / 63
页数:8
相关论文
共 50 条
  • [41] A uniform random test data generator for path testing
    Gotlieb, Arnaud
    Petit, Matthieu
    JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (12) : 2618 - 2626
  • [42] COCOA: A Synthetic Data Generator for Testing Anonymization Techniques
    Ayala-Rivera, Vanessa
    Portillo-Dominguez, A. Omar
    Murphy, Liam
    Thorpe, Christina
    PRIVACY IN STATISTICAL DATABASES: UNESCO CHAIR IN DATA PRIVACY, 2016, 9867 : 163 - 177
  • [43] A HIGH-SPEED DATA GENERATOR FOR DIGITAL TESTING
    HUBNER, U
    BERKEL, W
    NUSSLE, H
    BECKER, J
    HEWLETT-PACKARD JOURNAL, 1983, 34 (07): : 7 - 14
  • [44] vivaGen – a survival data set generator for software testing
    Matthias Gietzelt
    Christian Karmen
    Petra Knaup-Gregori
    Matthias Ganzinger
    BMC Bioinformatics, 21
  • [45] vivaGen - a survival data set generator for software testing
    Gietzelt, Matthias
    Karmen, Christian
    Knaup-Gregori, Petra
    Ganzinger, Matthias
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [46] Testing of Image Processing Algorithms on Synthetic Data
    von Neumann-Cosel, Kilian
    Roth, Erwin
    Lehmann, Daniel
    Speth, Johannes
    Knoll, Alois
    2009 FOURTH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING ADVANCES (ICSEA 2009), 2009, : 169 - 172
  • [47] Synthetic data for testing TRMM radar algorithms
    Jones, JA
    Meneghini, R
    Iguchi, T
    Tao, WK
    28TH CONFERENCE ON RADAR METEOROLOGY, 1997, : 196 - 197
  • [48] Evaluating Fraud Detection Algorithms using an Auction Data Generator
    Tsang, Sidney
    Dobbie, Gillian
    Koh, Yun Sing
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 332 - 339
  • [49] Applying Combinatorial Testing to Data Mining Algorithms
    Chandrasekaran, Jaganmohan
    Feng, Huadong
    Lei, Yu
    Kuhn, D. Richard
    Kacker, Raghu
    10TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS - ICSTW 2017, 2017, : 253 - 261
  • [50] Method of data source generation for testing data mining algorithms
    Du, Yi
    Li, Deyi
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2000, 37 (07): : 776 - 782