Generative Adversarial Networks for Synthetic Data Generation in Finance: Evaluating Statistical Similarities and Quality Assessment

被引:5
|
作者
Ramzan, Faisal [1 ]
Sartori, Claudio [2 ]
Consoli, Sergio [3 ]
Recupero, Diego Reforgiato [1 ]
机构
[1] Univ Cagliari, Dept Math & Comp Sci, I-09124 Cagliari, Italy
[2] Univ Bologna, Dept Comp Sci & Engn, I-40126 Bologna, Italy
[3] European Commiss, Joint Res Ctr DG JRC, Brussels, Belgium
关键词
generative adversarial networks; deep learning; data augmentation; synthetic data; BIG DATA;
D O I
10.3390/ai5020035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating synthetic data is a complex task that necessitates accurately replicating the statistical and mathematical properties of the original data elements. In sectors such as finance, utilizing and disseminating real data for research or model development can pose substantial privacy risks owing to the inclusion of sensitive information. Additionally, authentic data may be scarce, particularly in specialized domains where acquiring ample, varied, and high-quality data is difficult or costly. This scarcity or limited data availability can limit the training and testing of machine-learning models. In this paper, we address this challenge. In particular, our task is to synthesize a dataset with similar properties to an input dataset about the stock market. The input dataset is anonymized and consists of very few columns and rows, contains many inconsistencies, such as missing rows and duplicates, and its values are not normalized, scaled, or balanced. We explore the utilization of generative adversarial networks, a deep-learning technique, to generate synthetic data and evaluate its quality compared to the input stock dataset. Our innovation involves generating artificial datasets that mimic the statistical properties of the input elements without revealing complete information. For example, synthetic datasets can capture the distribution of stock prices, trading volumes, and market trends observed in the original dataset. The generated datasets cover a wider range of scenarios and variations, enabling researchers and practitioners to explore different market conditions and investment strategies. This diversity can enhance the robustness and generalization of machine-learning models. We evaluate our synthetic data in terms of the mean, similarities, and correlations.
引用
收藏
页码:667 / 685
页数:19
相关论文
共 50 条
  • [41] Generation of Synthetic Ampacity and Electricity Pool Prices using Generative Adversarial Networks
    Avkhimenia, Vadim
    Weis, Tim
    Musilek, Petr
    2021 IEEE ELECTRICAL POWER AND ENERGY CONFERENCE (EPEC), 2021, : 225 - 230
  • [42] GADA: Generative Adversarial Data Augmentation for Image Quality Assessment
    Bongini, Pietro
    Del Chiaro, Riccardo
    Bagdanov, Andrew D.
    Del Bimbo, Alberto
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2019, PT II, 2019, 11752 : 214 - 224
  • [43] Evaluating Generative Adversarial Networks: A Topological Approach
    Alipourjeddi, Narges
    Miri, Ali
    2023 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2023, : 202 - 206
  • [44] Synthetic data generation of vibration signals at different speed and load conditions of transmissions utilizing generative adversarial networks
    Koenig, Timo
    Wagner, Fabian
    Baessler, Robin
    Kley, Markus
    Liebschner, Marcus
    TM-TECHNISCHES MESSEN, 2023, 90 (10) : 639 - 649
  • [45] Contactless Blood Pressure Measurement Via Remote Photoplethysmography With Synthetic Data Generation Using Generative Adversarial Networks
    Wu, Bing-Fei
    Chiu, Li-Wen
    Wu, Yi-Chiao
    Lai, Chun-Chih
    Cheng, Hao-Min
    Chu, Pao-Hsien
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (02) : 621 - 632
  • [46] Traffic Accident Data Generation Based on Improved Generative Adversarial Networks
    Chen, Zhijun
    Zhang, Jingming
    Zhang, Yishi
    Huang, Zihao
    SENSORS, 2021, 21 (17)
  • [47] Masked Generative Adversarial Networks are Data-Efficient Generation Learners
    Huang, Jiaxing
    Cui, Kaiwen
    Guan, Dayan
    Xiao, Aoran
    Zhan, Fangneng
    Lu, Shijian
    Liao, Shengcai
    Xing, Eric
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [48] Energy data generation with Wasserstein Deep Convolutional Generative Adversarial Networks
    Li, Jianbin
    Chen, Zhiqiang
    Cheng, Long
    Liu, Xiufeng
    ENERGY, 2022, 257
  • [49] Quality Aware Generative Adversarial Networks
    Kancharla, Parimala
    Channappayya, Sumohana S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [50] Synthetic lung ultrasound data generation using autoencoder with generative adversarial network
    Fatima, Noreen
    Inchingolo, Riccardo
    Smargiassi, Andrea
    Soldati, Gino
    Torri, Elena
    Perrone, Tiziano
    Demi, Libertario
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):