Generative Adversarial Networks for Synthetic Data Generation in Finance: Evaluating Statistical Similarities and Quality Assessment

被引:5
|
作者
Ramzan, Faisal [1 ]
Sartori, Claudio [2 ]
Consoli, Sergio [3 ]
Recupero, Diego Reforgiato [1 ]
机构
[1] Univ Cagliari, Dept Math & Comp Sci, I-09124 Cagliari, Italy
[2] Univ Bologna, Dept Comp Sci & Engn, I-40126 Bologna, Italy
[3] European Commiss, Joint Res Ctr DG JRC, Brussels, Belgium
关键词
generative adversarial networks; deep learning; data augmentation; synthetic data; BIG DATA;
D O I
10.3390/ai5020035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating synthetic data is a complex task that necessitates accurately replicating the statistical and mathematical properties of the original data elements. In sectors such as finance, utilizing and disseminating real data for research or model development can pose substantial privacy risks owing to the inclusion of sensitive information. Additionally, authentic data may be scarce, particularly in specialized domains where acquiring ample, varied, and high-quality data is difficult or costly. This scarcity or limited data availability can limit the training and testing of machine-learning models. In this paper, we address this challenge. In particular, our task is to synthesize a dataset with similar properties to an input dataset about the stock market. The input dataset is anonymized and consists of very few columns and rows, contains many inconsistencies, such as missing rows and duplicates, and its values are not normalized, scaled, or balanced. We explore the utilization of generative adversarial networks, a deep-learning technique, to generate synthetic data and evaluate its quality compared to the input stock dataset. Our innovation involves generating artificial datasets that mimic the statistical properties of the input elements without revealing complete information. For example, synthetic datasets can capture the distribution of stock prices, trading volumes, and market trends observed in the original dataset. The generated datasets cover a wider range of scenarios and variations, enabling researchers and practitioners to explore different market conditions and investment strategies. This diversity can enhance the robustness and generalization of machine-learning models. We evaluate our synthetic data in terms of the mean, similarities, and correlations.
引用
收藏
页码:667 / 685
页数:19
相关论文
共 50 条
  • [31] Synthetic data generation based on local-foreground generative adversarial networks for surface defect detection
    Li, Bo
    Yuan, Xue
    Shi, Minghan
    JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (01)
  • [32] Autoencoder-Combined Generative Adversarial Networks for Synthetic Image Data Generation and Detection of Jellyfish Swarm
    Kim, Kyukwang
    Myung, Hyun
    IEEE ACCESS, 2018, 6 : 54207 - 54214
  • [33] Protecting Student Privacy with Synthetic Data from Generative Adversarial Networks
    Bautista, Peter
    Inventado, Paul Salvador
    ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 66 - 70
  • [34] Generating Synthetic Vehicle Data Using Decentralized Generative Adversarial Networks
    Shaker, Basem
    Papini, Gastone Pietro Rosati
    Saveriano, Matteo
    Liang, Kuo-Yun
    IEEE ACCESS, 2024, 12 : 138076 - 138085
  • [35] Synthetic minority oversampling of vital statistics data with generative adversarial networks
    Koivu, Aki
    Sairanen, Mikko
    Airola, Antti
    Pahikkala, Tapio
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (11) : 1667 - 1674
  • [36] Various Generative Adversarial Networks Model for Synthetic Prohibitory Sign Image Generation
    Dewi, Christine
    Chen, Rung-Ching
    Liu, Yan-Ting
    Yu, Hui
    APPLIED SCIENCES-BASEL, 2021, 11 (07):
  • [37] Generation of Realistic Synthetic Validation Healthcare Datasets Using Generative Adversarial Networks
    Ozyigit, Eda Bilici
    Arvanitis, Theodoros N.
    Despotou, George
    IMPORTANCE OF HEALTH INFORMATICS IN PUBLIC HEALTH DURING A PANDEMIC, 2020, 272 : 322 - 325
  • [38] Generation of synthetic ground glass nodules using generative adversarial networks (GANs)
    Wang, Zhixiang
    Zhang, Zhen
    Feng, Ying
    Hendriks, Lizza E. L.
    Miclea, Razvan L.
    Gietema, Hester
    Schoenmaekers, Janna
    Dekker, Andre
    Wee, Leonard
    Traverso, Alberto
    EUROPEAN RADIOLOGY EXPERIMENTAL, 2022, 6 (01)
  • [39] Generation of synthetic ground glass nodules using generative adversarial networks (GANs)
    Zhixiang Wang
    Zhen Zhang
    Ying Feng
    Lizza E. L. Hendriks
    Razvan L. Miclea
    Hester Gietema
    Janna Schoenmaekers
    Andre Dekker
    Leonard Wee
    Alberto Traverso
    European Radiology Experimental, 6
  • [40] Optimized Generative Adversarial Networks for Adversarial Sample Generation
    Alghazzawi, Daniyal M.
    Hasan, Syed Hamid
    Bhatia, Surbhi
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 72 (02): : 3877 - 3897