Synthetic Data Generation for Statistical Testing

被引：0

作者：

Soltana, Ghanem ^{[1
]}

Sabetzadeh, Mehrdad ^{[1
]}

Briand, Lionel C. ^{[1
]}

机构：

[1] Univ Luxembourg, SnT Ctr Secur Reliabil & Trust, Luxembourg, Luxembourg

来源：

PROCEEDINGS OF THE 2017 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE'17) | 2017年

基金：

欧洲研究理事会;

关键词：

Data Generation; Usage-based Statistical Testing; Model-Driven Engineering; UML; OCL; RELIABILITY;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Usage-based statistical testing employs knowledge about the actual or anticipated usage profile of the system under test for estimating system reliability. For many systems, usage-based statistical testing involves generating synthetic test data. Such data must possess the same statistical characteristics as the actual data that the system will process during operation. Synthetic test data must further satisfy any logical validity constraints that the actual data is subject to. Targeting data-intensive systems, we propose an approach for generating synthetic test data that is both statistically representative and logically valid. The approach works by first generating a data sample that meets the desired statistical characteristics, without taking into account the logical constraints. Subsequently, the approach tweaks the generated sample to fix any logical constraint violations. The tweaking process is iterative and continuously guided toward achieving the desired statistical characteristics. We report on a realistic evaluation of the approach, where we generate a synthetic population of citizens' records for testing a public administration IT system. Results suggest that our approach is scalable and capable of simultaneously fulfilling the statistical representativeness and logical validity requirements.

引用

页码：872 / 882

页数：11

共 50 条

[21] Testing statistical hypotheses with vague data
Grzegorzewski, P
FUZZY SETS AND SYSTEMS, 2000, 112 (03) : 501 - 510
[22] Characterization, synthetic generation, and statistical equivalence of composite microstructures
Sanei, Seyed Hamid Reza
Barsotti, Ercole J.
Leonhardt, David
Fertig, Ray S., III
JOURNAL OF COMPOSITE MATERIALS, 2017, 51 (13) : 1817 - 1829
[23] Statistical hypotheses testing for fuzzy data
Wu, HC
INFORMATION SCIENCES, 2005, 175 (1-2) : 30 - 56
[24] Data generation for path testing
Mansour, N
Salame, M
SOFTWARE QUALITY JOURNAL, 2004, 12 (02) : 121 - 136
[25] Data Generation for Path Testing
Nashat Mansour
Miran Salame
Software Quality Journal, 2004, 12 : 121 - 136
[26] Synthetic data generation by probabilistic PCA
Park, Min-Jeong
KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (04) : 279 - 294
[27] SDG - A system for synthetic data generation
Azalov, P
Zlatarova, F
ITCC 2003: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: COMPUTERS AND COMMUNICATIONS, PROCEEDINGS, 2003, : 69 - 75
[28] Synthetic data generation by diffusion models
Zhu, Jun
NATIONAL SCIENCE REVIEW, 2024, 11 (08)
[29] Synthetic data generation by diffusion models
Jun Zhu
National Science Review, 2024, 11 (08) : 19 - 21
[30] Synthetic data generation by probabilistic PCA
Park, Min-Jeong
KOREAN JOURNAL OF APPLIED STATISTICS, 2023, 36 (04) : 279 - 294

← 1 2 3 4 5 →