Synthetic data generation by probabilistic PCA

被引:0
|
作者
Park, Min-Jeong [1 ]
机构
[1] Stat Korea, Govt Complex Daejeon,189 Cheongsa Ro, Daejeon 35208, South Korea
关键词
synthetic data; probabilistic principal component analysis; IMPUTATION;
D O I
10.5351/KJAS.2023.36.4.279
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for generating synthetic data by the SRMI approaches. In this paper, I suggest generating synthetic data based on the probabilistic principal component analysis (PPCA) method. Two simple data sets are used for a simulation study to compare the SRMI and PPCA approaches. Simulation results demonstrate that pairwise coe fficients in synthetic data sets by PPCA can be closer to original ones than by SRMI. Furthermore, for the various data types that PPCA applications are well established, such as time series data, the PPCA approach can be extended to generate synthetic data sets.
引用
收藏
页码:279 / 294
页数:16
相关论文
共 50 条
  • [21] Synthetic data generation by diffusion models
    Jun Zhu
    National Science Review, 2024, 11 (08) : 19 - 21
  • [22] An Improved Mixture of Probabilistic PCA for Nonlinear Data-Driven Process Monitoring
    Zhang, Jingxin
    Chen, Hao
    Chen, Songhang
    Hong, Xia
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (01) : 198 - 210
  • [23] Synthetic Data Generation for Statistical Testing
    Soltana, Ghanem
    Sabetzadeh, Mehrdad
    Briand, Lionel C.
    PROCEEDINGS OF THE 2017 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE'17), 2017, : 872 - 882
  • [24] Replicant™ framework for synthetic data generation
    Kenul, Emily
    Black, Margaret
    Massey, Drew
    Havelka, Zachary
    Henkai, Mawia
    Gavin, Kyle
    Shellhorn, Luke
    SYNTHETIC DATA FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING: TOOLS, TECHNIQUES, AND APPLICATIONS II, 2024, 13035
  • [25] Generation and evaluation of synthetic patient data
    Goncalves, Andre
    Ray, Priyadip
    Soper, Braden
    Stevens, Jennifer
    Coyle, Linda
    Sales, Ana Paula
    BMC MEDICAL RESEARCH METHODOLOGY, 2020, 20 (01)
  • [26] Generation and evaluation of synthetic patient data
    Andre Goncalves
    Priyadip Ray
    Braden Soper
    Jennifer Stevens
    Linda Coyle
    Ana Paula Sales
    BMC Medical Research Methodology, 20
  • [27] GENERATION OF SYNTHETIC MT DATA TRAINS
    VARENTSOV, IM
    SOKOLOVA, EY
    FIZIKA ZEMLI, 1994, (06): : 80 - 88
  • [28] A synthetic fraud data generation methodology
    Lundin, E
    Kvarnström, H
    Jonsson, E
    INFORMATION AND COMMUNICATIONS SECURITY, PROCEEDINGS, 2002, 2513 : 265 - 277
  • [29] Synthetic Social Media Data Generation
    Sagduyu, Yalin E.
    Grushin, Alexander
    Shi, Yi
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2018, 5 (03): : 605 - 620
  • [30] Synthetic Data Generation for the Internet of Things
    Anderson, Jason W.
    Kennedy, K. E.
    Ngo, Linh B.
    Luckow, Andre
    Apon, Amy W.
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 171 - 176