A Hybrid GAN-Based Approach to Solve Imbalanced Data Problem in Recommendation Systems

被引:20
|
作者
Shafqat, Wafa [1 ]
Byun, Yung-Cheol [1 ]
机构
[1] Jeju Natl Univ, Dept Comp Engn, Jeju 63243, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Generative adversarial networks; Data models; Training; Data mining; IP networks; Generators; Numerical models; GAN; imbalanced data; oversampling; synthetic data; recommendation systems; condition GAN; WGAN-GP; PacGAN;
D O I
10.1109/ACCESS.2022.3141776
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of information technology, the amount of online data generation has been massive. Recommendation systems have become an effective tool in filtering information and solving the problem of information overload. Machine learning algorithms to build these recommendation systems require well-balanced data in terms of class distribution, but real-world datasets are mostly imbalanced in nature. Imbalanced data imposes a classifier to focus more on the majority class, neglecting other classes of interests and thus hindering the predictive performance of any classification model. There exist many traditional techniques for oversampling minority classes. Still, generative adversarial networks (GAN) have been showing excellent results in generating realistic synthetic tabular data that keeps the probability distribution of the original data intact. In this paper, we propose a hybrid GAN approach to solve the data imbalance problem to enhance recommendation systems' performance. We implemented conditional Wasserstein GAN with gradient penalty to generate tabular data containing both numerical and categorical values. We also augmented auxiliary classifier loss to enforce the model to explicitly generate data belonging to the minority class. We designed the discriminator architecture with the concept of PacGAN to receive m-packed samples as input instead of a single input. This inclusion of the PacGAN architecture eliminated the mode collapse problem in our proposed model. We did a two-fold evaluation of our model. Firstly based on the quality of the generated data and secondly on how different recommendation models perform using the generated data compared to original data.
引用
收藏
页码:11036 / 11047
页数:12
相关论文
共 50 条
  • [41] CDBH: A clustering and density-based hybrid approach for imbalanced data classification
    Mirzaei, Behzad
    Nikpour, Bahareh
    Nezamabadi-pour, Hossein
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164
  • [42] GAN-BASED SYNTHETIC MEDICAL IMAGE AUGMENTATION FOR CLASS IMBALANCED DERMOSCOPIC IMAGE ANALYSIS
    Alshardan, Amal
    Alahmari, Saad
    Alghamdi, Mohammed
    AL Sadig, Mutasim
    Mohamed, Abdullah
    Mohammed, Gouse Pasha
    FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2025,
  • [43] An Improved Approach to Detection of Rice Leaf Disease with GAN-Based Data Augmentation Pipeline
    Haruna, Yunusa
    Qin, Shiyin
    Kiki, Mesmin J. Mbyamm J.
    APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [44] A Hybrid Sampling SVM Approach to Imbalanced Data Classification
    Wang, Qiang
    ABSTRACT AND APPLIED ANALYSIS, 2014,
  • [45] A Knowledge-Enhanced Deep Recommendation Framework Incorporating GAN-based Models
    Yang, Deqing
    Guo, Zikai
    Wang, Ziyi
    Jiang, Junyang
    Xiao, Yanghua
    Wang, Wei
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1368 - 1373
  • [46] Imbalanced Data Classification Based on Hybrid Methods
    Zhang, Nai-Nan
    Ye, Shao-Zhen
    Chien, Ting-Ying
    PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA RESEARCH (ICBDR 2018), 2018, : 16 - 20
  • [47] Context-aware GAN-based knowledge recommendation method in engineering field
    Wang L.
    Jiang Z.
    Niu J.
    Huang Y.
    Li X.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2022, 28 (03): : 798 - 811
  • [48] GAN-based data reconstruction attacks in split learning
    Zeng, Bo
    Luo, Sida
    Yu, Fangchao
    Yang, Geying
    Zhao, Kai
    Wang, Lina
    NEURAL NETWORKS, 2025, 185
  • [49] Privacy preservation for image data: A GAN-based method
    Chen, Zhenfei
    Zhu, Tianqing
    Xiong, Ping
    Wang, Chenguang
    Ren, Wei
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (04) : 1668 - 1685
  • [50] GAN-based one dimensional medical data augmentation
    Ye Zhang
    Zhixiang Wang
    Zhen Zhang
    Junzhuo Liu
    Ying Feng
    Leonard Wee
    Andre Dekker
    Qiaosong Chen
    Alberto Traverso
    Soft Computing, 2023, 27 : 10481 - 10491