Synthetic Data Generation for Statistical Testing

被引:0
|
作者
Soltana, Ghanem [1 ]
Sabetzadeh, Mehrdad [1 ]
Briand, Lionel C. [1 ]
机构
[1] Univ Luxembourg, SnT Ctr Secur Reliabil & Trust, Luxembourg, Luxembourg
基金
欧洲研究理事会;
关键词
Data Generation; Usage-based Statistical Testing; Model-Driven Engineering; UML; OCL; RELIABILITY;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Usage-based statistical testing employs knowledge about the actual or anticipated usage profile of the system under test for estimating system reliability. For many systems, usage-based statistical testing involves generating synthetic test data. Such data must possess the same statistical characteristics as the actual data that the system will process during operation. Synthetic test data must further satisfy any logical validity constraints that the actual data is subject to. Targeting data-intensive systems, we propose an approach for generating synthetic test data that is both statistically representative and logically valid. The approach works by first generating a data sample that meets the desired statistical characteristics, without taking into account the logical constraints. Subsequently, the approach tweaks the generated sample to fix any logical constraint violations. The tweaking process is iterative and continuously guided toward achieving the desired statistical characteristics. We report on a realistic evaluation of the approach, where we generate a synthetic population of citizens' records for testing a public administration IT system. Results suggest that our approach is scalable and capable of simultaneously fulfilling the statistical representativeness and logical validity requirements.
引用
收藏
页码:872 / 882
页数:11
相关论文
共 50 条
  • [41] Declarative generation of synthetic XML data
    Barbosa, Denilson
    Mendelzon, Alberto O.
    SOFTWARE-PRACTICE & EXPERIENCE, 2006, 36 (10): : 1051 - 1079
  • [42] THE GENERATION OF SYNTHETIC CLINICAL TRIAL DATA
    Mosquera, L.
    VALUE IN HEALTH, 2019, 22 : S519 - S519
  • [43] Status of Synthetic Data Generation for Structured Health Data
    El Emam, Khaled
    JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [44] Status of Synthetic Data Generation for Structured Health Data
    El Emam, Khaled
    JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [45] Data generation processes and statistical management of interval data
    Blanco-Fernandez, Angela
    Winker, Peter
    ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2016, 100 (04) : 475 - 494
  • [46] Data generation processes and statistical management of interval data
    Angela Blanco-Fernández
    Peter Winker
    AStA Advances in Statistical Analysis, 2016, 100 : 475 - 494
  • [47] Statistical modeling of keystroke dynamics samples for the generation of synthetic datasets
    Migdal, Denis
    Rosenberger, Christophe
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 100 : 907 - 920
  • [48] SYNTHETIC DATA GENERATION THROUGH STATISTICAL EXPLOSION: IMPROVING CLASSIFICATION ACCURACY OF CORONARY ARTERY DISEASE USING PPG
    Bhattacharya, Sakyajit
    Mazumder, Oishee
    Roy, Dibyendu
    Sinha, Aniruddha
    Ghose, Avik
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1165 - 1169
  • [49] Testing of Image Processing Algorithms on Synthetic Data
    von Neumann-Cosel, Kilian
    Roth, Erwin
    Lehmann, Daniel
    Speth, Johannes
    Knoll, Alois
    2009 FOURTH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING ADVANCES (ICSEA 2009), 2009, : 169 - 172
  • [50] Synthetic data for testing TRMM radar algorithms
    Jones, JA
    Meneghini, R
    Iguchi, T
    Tao, WK
    28TH CONFERENCE ON RADAR METEOROLOGY, 1997, : 196 - 197