Generative Adversarial Networks for Synthetic Data Generation in Finance: Evaluating Statistical Similarities and Quality Assessment

被引:5
|
作者
Ramzan, Faisal [1 ]
Sartori, Claudio [2 ]
Consoli, Sergio [3 ]
Recupero, Diego Reforgiato [1 ]
机构
[1] Univ Cagliari, Dept Math & Comp Sci, I-09124 Cagliari, Italy
[2] Univ Bologna, Dept Comp Sci & Engn, I-40126 Bologna, Italy
[3] European Commiss, Joint Res Ctr DG JRC, Brussels, Belgium
关键词
generative adversarial networks; deep learning; data augmentation; synthetic data; BIG DATA;
D O I
10.3390/ai5020035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating synthetic data is a complex task that necessitates accurately replicating the statistical and mathematical properties of the original data elements. In sectors such as finance, utilizing and disseminating real data for research or model development can pose substantial privacy risks owing to the inclusion of sensitive information. Additionally, authentic data may be scarce, particularly in specialized domains where acquiring ample, varied, and high-quality data is difficult or costly. This scarcity or limited data availability can limit the training and testing of machine-learning models. In this paper, we address this challenge. In particular, our task is to synthesize a dataset with similar properties to an input dataset about the stock market. The input dataset is anonymized and consists of very few columns and rows, contains many inconsistencies, such as missing rows and duplicates, and its values are not normalized, scaled, or balanced. We explore the utilization of generative adversarial networks, a deep-learning technique, to generate synthetic data and evaluate its quality compared to the input stock dataset. Our innovation involves generating artificial datasets that mimic the statistical properties of the input elements without revealing complete information. For example, synthetic datasets can capture the distribution of stock prices, trading volumes, and market trends observed in the original dataset. The generated datasets cover a wider range of scenarios and variations, enabling researchers and practitioners to explore different market conditions and investment strategies. This diversity can enhance the robustness and generalization of machine-learning models. We evaluate our synthetic data in terms of the mean, similarities, and correlations.
引用
收藏
页码:667 / 685
页数:19
相关论文
共 50 条
  • [21] Synthetic Traffic Sign Image Generation Applying Generative Adversarial Networks
    Dewi, Christine
    Chen, Rung-Ching
    Liu, Yan-Ting
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2022, 09 (03) : 333 - 348
  • [22] Generative Adversarial Networks for Synthetic Defect Generation in Assembly and Test Manufacturing
    Singh, Rajhans
    Garg, Ravi
    Patel, Nital S.
    Braun, Martin W.
    2020 31ST ANNUAL SEMI ADVANCED SEMICONDUCTOR MANUFACTURING CONFERENCE (ASMC), 2020,
  • [23] Generative Adversarial Networks for Data Generation in Structural Health Monitoring
    Luleci, Furkan
    Catbas, F. Necati
    Avci, Onur
    FRONTIERS IN BUILT ENVIRONMENT, 2022, 8
  • [24] An overview of biological data generation using generative adversarial networks
    Liu, Lin
    Xia, Yujing
    Tang, Lin
    2020 IEEE CONFERENCE ON TELECOMMUNICATIONS, OPTICS AND COMPUTER SCIENCE (TOCS), 2020, : 141 - 144
  • [25] Geolocated Data Generation and Protection Using Generative Adversarial Networks
    Alatrista-Salas, Hugo
    Montalvo-Garcia, Peter
    Nunez-del-Prado, Miguel
    Salas, Julián
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13408 LNAI : 80 - 91
  • [26] Automated Software Test Data Generation With Generative Adversarial Networks
    Guo, Xiujing
    Okamura, Hiroyuki
    Dohi, Tadashi
    IEEE ACCESS, 2022, 10 : 20690 - 20700
  • [27] TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks
    Rajabi, Amirarsalan
    Garibay, Ozlem Ozmen
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2022, 4 (02): : 488 - 501
  • [28] Geolocated Data Generation and Protection Using Generative Adversarial Networks
    Alatrista-Salas, Hugo
    Montalvo-Garcia, Peter
    Nunez-del-Prado, Miguel
    Salas, Julian
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, MDAI 2022, 2022, 13408 : 80 - 91
  • [29] Synthetic Dynamic PMU Data Generation: A Generative Adversarial Network Approach
    Zheng, Xiangtian
    Wang, Bin
    Xie, Le
    2019 INTERNATIONAL CONFERENCE ON SMART GRID SYNCHRONIZED MEASUREMENTS AND ANALYTICS (SGSMA), 2019,
  • [30] Full-Scale Continuous Synthetic Sonar Data Generation with Markov Conditional Generative Adversarial Networks
    Jegorova, Marija
    Karjalainen, Antti Ilari
    Vazquez, Jose
    Hospedales, Timothy
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 3168 - 3174