Synthetic Data Generation for Statistical Testing

被引:0
|
作者
Soltana, Ghanem [1 ]
Sabetzadeh, Mehrdad [1 ]
Briand, Lionel C. [1 ]
机构
[1] Univ Luxembourg, SnT Ctr Secur Reliabil & Trust, Luxembourg, Luxembourg
基金
欧洲研究理事会;
关键词
Data Generation; Usage-based Statistical Testing; Model-Driven Engineering; UML; OCL; RELIABILITY;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Usage-based statistical testing employs knowledge about the actual or anticipated usage profile of the system under test for estimating system reliability. For many systems, usage-based statistical testing involves generating synthetic test data. Such data must possess the same statistical characteristics as the actual data that the system will process during operation. Synthetic test data must further satisfy any logical validity constraints that the actual data is subject to. Targeting data-intensive systems, we propose an approach for generating synthetic test data that is both statistically representative and logically valid. The approach works by first generating a data sample that meets the desired statistical characteristics, without taking into account the logical constraints. Subsequently, the approach tweaks the generated sample to fix any logical constraint violations. The tweaking process is iterative and continuously guided toward achieving the desired statistical characteristics. We report on a realistic evaluation of the approach, where we generate a synthetic population of citizens' records for testing a public administration IT system. Results suggest that our approach is scalable and capable of simultaneously fulfilling the statistical representativeness and logical validity requirements.
引用
收藏
页码:872 / 882
页数:11
相关论文
共 50 条
  • [31] Replicant™ framework for synthetic data generation
    Kenul, Emily
    Black, Margaret
    Massey, Drew
    Havelka, Zachary
    Henkai, Mawia
    Gavin, Kyle
    Shellhorn, Luke
    SYNTHETIC DATA FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING: TOOLS, TECHNIQUES, AND APPLICATIONS II, 2024, 13035
  • [32] Generation and evaluation of synthetic patient data
    Goncalves, Andre
    Ray, Priyadip
    Soper, Braden
    Stevens, Jennifer
    Coyle, Linda
    Sales, Ana Paula
    BMC MEDICAL RESEARCH METHODOLOGY, 2020, 20 (01)
  • [33] Generation and evaluation of synthetic patient data
    Andre Goncalves
    Priyadip Ray
    Braden Soper
    Jennifer Stevens
    Linda Coyle
    Ana Paula Sales
    BMC Medical Research Methodology, 20
  • [34] GENERATION OF SYNTHETIC MT DATA TRAINS
    VARENTSOV, IM
    SOKOLOVA, EY
    FIZIKA ZEMLI, 1994, (06): : 80 - 88
  • [35] A synthetic fraud data generation methodology
    Lundin, E
    Kvarnström, H
    Jonsson, E
    INFORMATION AND COMMUNICATIONS SECURITY, PROCEEDINGS, 2002, 2513 : 265 - 277
  • [36] Synthetic Social Media Data Generation
    Sagduyu, Yalin E.
    Grushin, Alexander
    Shi, Yi
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2018, 5 (03): : 605 - 620
  • [37] Synthetic Data Generation for the Internet of Things
    Anderson, Jason W.
    Kennedy, K. E.
    Ngo, Linh B.
    Luckow, Andre
    Apon, Amy W.
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 171 - 176
  • [38] Scaling Synthetic Brain Data Generation
    Doan, Mike
    Plis, Sergey
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2025, 29 (02) : 840 - 847
  • [39] Generation and evaluation of medical synthetic data
    Goncalves, Andre R.
    Ray, Priyadip
    Soper, Braden
    Myneni, Madhumita
    Stevens, Jennifer L.
    Coyle, Linda M.
    Sales, Ana Paula
    CANCER RESEARCH, 2019, 79 (13)
  • [40] Generation of synthetic data for tropical cyclones
    Abraham, R
    Mohanty, UC
    Dash, SK
    12TH INTERNATIONAL CONFERENCE ON INTERACTIVE INFORMATION AND PROCESSING SYSTEMS (IIPS) FOR METEOROLOGY, OCEANOGRAPHY, AND HYDROLOGY: JOINT SESSION WITH FIFTH SYMPOSIUM ON EDUCATION, 1996, : 479 - 479