A Hybrid GAN-Based Approach to Solve Imbalanced Data Problem in Recommendation Systems

被引:20
|
作者
Shafqat, Wafa [1 ]
Byun, Yung-Cheol [1 ]
机构
[1] Jeju Natl Univ, Dept Comp Engn, Jeju 63243, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Generative adversarial networks; Data models; Training; Data mining; IP networks; Generators; Numerical models; GAN; imbalanced data; oversampling; synthetic data; recommendation systems; condition GAN; WGAN-GP; PacGAN;
D O I
10.1109/ACCESS.2022.3141776
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of information technology, the amount of online data generation has been massive. Recommendation systems have become an effective tool in filtering information and solving the problem of information overload. Machine learning algorithms to build these recommendation systems require well-balanced data in terms of class distribution, but real-world datasets are mostly imbalanced in nature. Imbalanced data imposes a classifier to focus more on the majority class, neglecting other classes of interests and thus hindering the predictive performance of any classification model. There exist many traditional techniques for oversampling minority classes. Still, generative adversarial networks (GAN) have been showing excellent results in generating realistic synthetic tabular data that keeps the probability distribution of the original data intact. In this paper, we propose a hybrid GAN approach to solve the data imbalance problem to enhance recommendation systems' performance. We implemented conditional Wasserstein GAN with gradient penalty to generate tabular data containing both numerical and categorical values. We also augmented auxiliary classifier loss to enforce the model to explicitly generate data belonging to the minority class. We designed the discriminator architecture with the concept of PacGAN to receive m-packed samples as input instead of a single input. This inclusion of the PacGAN architecture eliminated the mode collapse problem in our proposed model. We did a two-fold evaluation of our model. Firstly based on the quality of the generated data and secondly on how different recommendation models perform using the generated data compared to original data.
引用
收藏
页码:11036 / 11047
页数:12
相关论文
共 50 条
  • [11] Tabular GAN-Based Oversampling of Imbalanced Time-to-Event Data for Survival Prediction
    Tan, Huaning
    Chen, Renxing
    Qin, Meng
    Tang, Lining
    Wu, Zhibing
    Luo, Qianlin
    Quan, Yujuan
    2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 376 - 380
  • [12] RGAN-EL: A GAN and ensemble learning-based hybrid approach for imbalanced data classification
    Ding, Hongwei
    Sun, Yu
    Wang, Zhenyu
    Huang, Nana
    Shen, Zhidong
    Cui, Xiaohui
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [13] Empirical Evaluation of Map Reduce Based Hybrid Approach for Problem of Imbalanced Classification in Big Data
    Ahlawat, Khyati
    Chug, Anuradha
    Singh, Amit Prakash
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2019, 11 (03) : 23 - 45
  • [14] A GAN-Based Data Augmentation Method for Imbalanced Multi-Class Skin Lesion Classification
    Su, Qichen
    Hamed, Haza Nuzly Abdull
    Isa, Mohd Adham
    Hao, Xue
    Dai, Xin
    IEEE ACCESS, 2024, 12 : 16498 - 16513
  • [15] A Survey on GAN-Based Data Augmentation for Hand Pose Estimation Problem
    Farahanipad, Farnaz
    Rezaei, Mohammad
    Nasr, Mohammad Sadegh
    Kamangar, Farhad
    Athitsos, Vassilis
    TECHNOLOGIES, 2022, 10 (02)
  • [16] Temporal-Based Approach to Solve Item Decay Problem in Recommendation System
    Al-Qasem, Al-Hadi Ismail Ahmed
    Sharef, Nurfadhlina Mohd
    Sulaiman, Md Nasir
    Mustapha, Norwati
    ADVANCED SCIENCE LETTERS, 2018, 24 (02) : 1421 - 1426
  • [17] Freshwater Microscopic Algae Detection Based on Deep Neural Network with GAN-Based Augmentation for Imbalanced Algal Data
    Fung, Benjamin S. B.
    Chan, Wang Hin
    Lo, Irene M. C.
    Tsang, Danny H. K.
    ACS ES&T WATER, 2023, 4 (03): : 982 - 990
  • [18] ACWGAN: AN AUXILIARY CLASSIFIER WASSERSTEIN GAN-BASED OVERSAMPLING APPROACH FOR MULTI-CLASS IMBALANCED LEARNING
    Liao, Chen
    Dong, Minggang
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2022, 18 (03): : 703 - 721
  • [19] LEGAN: Addressing Intraclass Imbalance in GAN-Based Medical Image Augmentation for Improved Imbalanced Data Classification
    Ding, Hongwei
    Huang, Nana
    Wu, Yaoxin
    Cui, Xiaohui
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 14
  • [20] A GAN-based approach for password guessing
    Bao Ngoc Vi
    Nguyen Ngoc Tran
    Trung Giap Vu The
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 307 - 311