Synthetic Data Generation for Statistical Testing

被引:0
|
作者
Soltana, Ghanem [1 ]
Sabetzadeh, Mehrdad [1 ]
Briand, Lionel C. [1 ]
机构
[1] Univ Luxembourg, SnT Ctr Secur Reliabil & Trust, Luxembourg, Luxembourg
基金
欧洲研究理事会;
关键词
Data Generation; Usage-based Statistical Testing; Model-Driven Engineering; UML; OCL; RELIABILITY;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Usage-based statistical testing employs knowledge about the actual or anticipated usage profile of the system under test for estimating system reliability. For many systems, usage-based statistical testing involves generating synthetic test data. Such data must possess the same statistical characteristics as the actual data that the system will process during operation. Synthetic test data must further satisfy any logical validity constraints that the actual data is subject to. Targeting data-intensive systems, we propose an approach for generating synthetic test data that is both statistically representative and logically valid. The approach works by first generating a data sample that meets the desired statistical characteristics, without taking into account the logical constraints. Subsequently, the approach tweaks the generated sample to fix any logical constraint violations. The tweaking process is iterative and continuously guided toward achieving the desired statistical characteristics. We report on a realistic evaluation of the approach, where we generate a synthetic population of citizens' records for testing a public administration IT system. Results suggest that our approach is scalable and capable of simultaneously fulfilling the statistical representativeness and logical validity requirements.
引用
收藏
页码:872 / 882
页数:11
相关论文
共 50 条
  • [1] Generation of Synthetic Transcriptome Data with Defined Statistical Properties for the Development and Testing of New Analysis Methods
    Guillaume Brysbaert
    Sebastian Noth
    Arndt Benecke
    Genomics Proteomics & Bioinformatics, 2007, (01) : 45 - 52
  • [2] Synthetic data generation capabilties for testing data mining tools
    Jeske, Daniel R.
    Lin, Pengyue J.
    Rendon, Carlos
    Xiao, Rui
    Samadi, Behrokh
    MILCOM 2006, VOLS 1-7, 2006, : 1876 - +
  • [3] Towards Synthetic Data Generation of VANET Attacks for Efficient Testing
    Rosenstatter, Thomas
    Melnyk, Kateryna
    2023 IEEE INTELLIGENT VEHICLES SYMPOSIUM, IV, 2023,
  • [4] Synthetic Data Generation Using Combinatorial Testing and Variational Autoencoder
    Khadka, Krishna
    Chandrasekaran, Jaganmohan
    Lei, Yu
    Kacker, Raghu N.
    Kuhn, D. Richard
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS, ICSTW, 2023, : 228 - 236
  • [5] SYNTHETIC DATA GENERATION AND TESTING FOR THE SEMANTIC SEGMENTATION OF HERITAGE BUILDINGS
    Pellis, E.
    Masiero, A.
    Grussenmeyer, P.
    Betti, M.
    Tucci, G.
    29TH CIPA SYMPOSIUM DOCUMENTING, UNDERSTANDING, PRESERVING CULTURAL HERITAGE. HUMANITIES AND DIGITAL TECHNOLOGIES FOR SHAPING THE FUTURE, VOL. 48-M-2, 2023, : 1189 - 1196
  • [6] Synthetic data generation for the continuous development and testing of autonomous construction machinery
    Schuster, Alexander
    Hagmanns, Raphael
    Sonji, Iman
    Loecklin, Andreas
    Petereit, Janko
    Ebert, Christof
    Weyrich, Michael
    AT-AUTOMATISIERUNGSTECHNIK, 2023, 71 (11) : 953 - 968
  • [7] Testing for Multivariate Normality in Mass Spectrometry Imaging Data: A Robust Statistical Approach for Clustering Evaluation and the Generation of Synthetic Mass Spectrometry Imaging Data Sets
    Dexter, Alex
    Race, Alan M.
    Styles, Iain B.
    Bunch, Josephine
    ANALYTICAL CHEMISTRY, 2016, 88 (22) : 10893 - 10899
  • [8] Enhanced classification of hydraulic testing of directional control valves with synthetic data generation
    Christian Neunzig
    Dennis Möllensiep
    Melanie Hartmann
    Bernd Kuhlenkötter
    Matthias Möller
    Jürgen Schulz
    Production Engineering, 2023, 17 : 669 - 678
  • [9] Enhanced classification of hydraulic testing of directional control valves with synthetic data generation
    Neunzig, Christian
    Moellensiep, Dennis
    Hartmann, Melanie
    Kuhlenkoetter, Bernd
    Moeller, Matthias
    Schulz, Juergen
    PRODUCTION ENGINEERING-RESEARCH AND DEVELOPMENT, 2023, 17 (05): : 669 - 678
  • [10] Trade offs between statistical agreement and data reproduction in the generation of synthetic ground motions
    Olivier, Audrey
    Smyth, Andrew W.
    PROBABILISTIC ENGINEERING MECHANICS, 2016, 43 : 36 - 49