A Hybrid GAN-Based Approach to Solve Imbalanced Data Problem in Recommendation Systems

被引:20
|
作者
Shafqat, Wafa [1 ]
Byun, Yung-Cheol [1 ]
机构
[1] Jeju Natl Univ, Dept Comp Engn, Jeju 63243, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Generative adversarial networks; Data models; Training; Data mining; IP networks; Generators; Numerical models; GAN; imbalanced data; oversampling; synthetic data; recommendation systems; condition GAN; WGAN-GP; PacGAN;
D O I
10.1109/ACCESS.2022.3141776
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of information technology, the amount of online data generation has been massive. Recommendation systems have become an effective tool in filtering information and solving the problem of information overload. Machine learning algorithms to build these recommendation systems require well-balanced data in terms of class distribution, but real-world datasets are mostly imbalanced in nature. Imbalanced data imposes a classifier to focus more on the majority class, neglecting other classes of interests and thus hindering the predictive performance of any classification model. There exist many traditional techniques for oversampling minority classes. Still, generative adversarial networks (GAN) have been showing excellent results in generating realistic synthetic tabular data that keeps the probability distribution of the original data intact. In this paper, we propose a hybrid GAN approach to solve the data imbalance problem to enhance recommendation systems' performance. We implemented conditional Wasserstein GAN with gradient penalty to generate tabular data containing both numerical and categorical values. We also augmented auxiliary classifier loss to enforce the model to explicitly generate data belonging to the minority class. We designed the discriminator architecture with the concept of PacGAN to receive m-packed samples as input instead of a single input. This inclusion of the PacGAN architecture eliminated the mode collapse problem in our proposed model. We did a two-fold evaluation of our model. Firstly based on the quality of the generated data and secondly on how different recommendation models perform using the generated data compared to original data.
引用
收藏
页码:11036 / 11047
页数:12
相关论文
共 50 条
  • [31] A GAN-based Hybrid Deep Learning Approach for Enhancing Intrusion Detection in IoT Networks
    Balaji, S.
    Dhanabalan, G.
    Umarani, C.
    Naskath, J.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (06) : 348 - 354
  • [32] VagueGAN: A GAN-Based Data Poisoning Attack Against Federated Learning Systems
    Sun, Wei
    Gao, Bo
    Xiong, Ke
    Lu, Yang
    Wang, Yuwei
    2023 20TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON SENSING, COMMUNICATION, AND NETWORKING, SECON, 2023,
  • [33] GAN-based Intrusion Detection Data Enhancement
    Fu, Wei
    Qian, Liping
    Zhu, Xiaohui
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 2739 - 2744
  • [34] An improved GAN-based approach for image inpainting
    Ngoc-Thao Nguyen
    Bang-Dang Pham
    Thanh-Sang Thai
    Minh-Thanh Nguyen
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 174 - 179
  • [35] GAN-based Matrix Factorization for Recommender Systems
    Dervishaj, Ervin
    Cremonesi, Paolo
    37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 1373 - 1381
  • [36] A GAN-Based Data Injection Attack Method on Data-Driven Strategies in Power Systems
    Liu, Zengji
    Wang, Qi
    Ye, Yujian
    Tang, Yi
    IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (04) : 3203 - 3213
  • [37] Design of a GaN-Based Microinverter for Photovoltaic Systems
    Garcia-Rodriguez, L.
    Jones, V.
    Balda, J. C.
    Lindstrom, E.
    Oliva, A.
    Gonzalez-Llorente, J.
    2014 IEEE 5TH INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS FOR DISTRIBUTED GENERATION SYSTEMS (PEDG), 2014,
  • [38] A hybrid deep learning approach to solve optimal power flow problem in hybrid renewable energy systems
    Gurumoorthi, G.
    Senthilkumar, S.
    Karthikeyan, G.
    Alsaif, Faisal
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [39] A Hybrid Approach to Solve the Agile Team Allocation Problem
    Britto, Ricardo
    Santos Neto, Pedro
    Rabelo, Ricardo
    Ayala, Werney
    Soares, Thiago
    2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
  • [40] An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift
    Liu, Yansong
    Wang, Shuang
    Sui, He
    Zhu, Li
    PLOS ONE, 2024, 19 (01):