A Hybrid GAN-Based Approach to Solve Imbalanced Data Problem in Recommendation Systems

被引:20
|
作者
Shafqat, Wafa [1 ]
Byun, Yung-Cheol [1 ]
机构
[1] Jeju Natl Univ, Dept Comp Engn, Jeju 63243, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Generative adversarial networks; Data models; Training; Data mining; IP networks; Generators; Numerical models; GAN; imbalanced data; oversampling; synthetic data; recommendation systems; condition GAN; WGAN-GP; PacGAN;
D O I
10.1109/ACCESS.2022.3141776
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of information technology, the amount of online data generation has been massive. Recommendation systems have become an effective tool in filtering information and solving the problem of information overload. Machine learning algorithms to build these recommendation systems require well-balanced data in terms of class distribution, but real-world datasets are mostly imbalanced in nature. Imbalanced data imposes a classifier to focus more on the majority class, neglecting other classes of interests and thus hindering the predictive performance of any classification model. There exist many traditional techniques for oversampling minority classes. Still, generative adversarial networks (GAN) have been showing excellent results in generating realistic synthetic tabular data that keeps the probability distribution of the original data intact. In this paper, we propose a hybrid GAN approach to solve the data imbalance problem to enhance recommendation systems' performance. We implemented conditional Wasserstein GAN with gradient penalty to generate tabular data containing both numerical and categorical values. We also augmented auxiliary classifier loss to enforce the model to explicitly generate data belonging to the minority class. We designed the discriminator architecture with the concept of PacGAN to receive m-packed samples as input instead of a single input. This inclusion of the PacGAN architecture eliminated the mode collapse problem in our proposed model. We did a two-fold evaluation of our model. Firstly based on the quality of the generated data and secondly on how different recommendation models perform using the generated data compared to original data.
引用
收藏
页码:11036 / 11047
页数:12
相关论文
共 50 条
  • [21] Antenna Design Using a GAN-Based Synthetic Data Generation Approach
    Noakoasteen, Oameed
    Vijayamohanan, Jayakrishnan
    Gupta, Arjun
    Christodoulou, Christos
    IEEE OPEN JOURNAL OF ANTENNAS AND PROPAGATION, 2022, 3 : 488 - 494
  • [22] Improving fault diagnosis in elevator systems with GAN-based synthetic data
    Lv, Xiaomei
    Lu, Zhibin
    Huang, Zhihao
    Wei, Zhanhao
    INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2025, 47 (01)
  • [23] A Hybrid Method to Solve Data Sparsity in Travel Recommendation Agents Using Fuzzy Logic Approach
    Nilashi, Mehrbakhsh
    Abumalloh, Rabab Ali
    Alrizq, Mesfer
    Almulihi, Ahmed
    Alghamdi, O. A.
    Farooque, Murtaza
    Samad, Sarminah
    Mohd, Saidatulakmal
    Ahmadi, Hossein
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [24] Enhancing Wireless Data Transmission: A GAN-based Approach for Time Series Data Restoration
    Han, Daejin
    Na, Woongsoo
    38TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN 2024, 2024, : 429 - 433
  • [25] A Hybrid Method to Solve Data Sparsity in Travel Recommendation Agents Using Fuzzy Logic Approach
    Ucsi Graduate Business School, Ucsi University, No. 1 Jalan Menara Gading, UCSI Heights, Cheras, Kuala Lumpur
    56000, Malaysia
    不详
    不详
    不详
    不详
    21944, Saudi Arabia
    不详
    不详
    不详
    不详
    PL4 8AA, United Kingdom
    Math. Probl. Eng.,
  • [26] A cluster-based hybrid sampling approach for imbalanced data classification
    Feng, Shou
    Zhao, Chunhui
    Fu, Ping
    REVIEW OF SCIENTIFIC INSTRUMENTS, 2020, 91 (05):
  • [27] Permuted KPCA and SMOTE to Guide GAN-Based Oversampling for Imbalanced HSI Classification
    Miftahushudur, Tajul
    Grieve, Bruce
    Yin, Hujun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 489 - 505
  • [28] A Hybrid Recommendation Approach Based on Social Tagging Data Preprocession
    Zhao, Haiyan
    Guo, Di
    Chen, Qingkui
    Gao, Liping
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2014, : 185 - 189
  • [29] A Hybrid Approach for Binary Classification of Imbalanced Data
    Tsai, Hsinhan
    Yang, Ta-Wei
    Wong, Wai-Man
    Kao, Han-Yi
    Chou, Cheng-Fu
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024, 23 (03)
  • [30] GAN-Based Bearing Fault Diagnosis Method for Short and Imbalanced Vibration Signal
    Bai, Guoli
    Sun, Wei
    Cao, Cong
    Wang, Dongfeng
    Sun, Qingchao
    Sun, Liang
    IEEE SENSORS JOURNAL, 2024, 24 (02) : 1894 - 1904