Resampling Effects on Imbalanced Data in Network Intrusion Classification

被引:0
|
作者
Rahma, Fayruz [1 ]
Rajasa, Mahesa Cadi [2 ]
Rachmadi, Reza Fuad [3 ]
Pratomo, Baskoro Adi [4 ]
Purnomo, Mauridhi Hery [3 ]
机构
[1] Inst Teknol Sepuluh Nopember, Dept Elect Engn, Surabaya, Indonesia
[2] Univ Islam Indonesia, Dept Informat, Yogyakarta, Indonesia
[3] Inst Teknol Sepuluh Nopember, Dept Comp Engn, Dept Elect Engn, Surabaya, Indonesia
[4] Inst Teknol Sepuluh Nopember, Dept Informat, Surabaya, Indonesia
来源
2024 INTERNATIONAL ELECTRONICS SYMPOSIUM, IES 2024 | 2024年
关键词
imbalance data; network intrusion detection; resampling techniques;
D O I
10.1109/IES63037.2024.10665861
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The rapid expansion of network connections has significantly increased network traffic activity, introducing new cybersecurity challenges and heightened vulnerability to cyber attacks. To address these challenges, researchers have leveraged intelligent techniques such as machine learning (ML) to enhance attack detection accuracy in network traffic. However, ML models often face a data imbalance issue in their training sets. This imbalance, typically due to the uneven distribution of attack classes, hampers the classification performance of ML models in network intrusion detection. To mitigate class imbalance, various resampling techniques can be employed. This study evaluates several resampling techniques, including Random Oversampling, SMOTE, ADASYN, Random Undersampling, Tomek Links, and SMOTE-Tomek. Using the UNSW-NB15 dataset, we trained and tested ML models, including Decision Tree, Random Forest, Gradient Boosting, XGBoost, and 1D-CNN algorithms. Our analysis demonstrates that resampling techniques significantly impact the performance of machine learning models. The Tomek Links technique applied to the 1D-CNN model achieved the highest performance, with an accuracy of 75.27%, a precision of 87.58%, and an F1-score of 76.22%. Notably, the best recall score of 67.57% was obtained from the 1D-CNN model without resampling. These findings provide valuable insights for researchers and engineers, aiding in selecting appropriate resampling techniques for developing robust detection models for network traffic attacks.
引用
收藏
页码:534 / 540
页数:7
相关论文
共 50 条
  • [21] Review of resampling techniques for the treatment of imbalanced industrial data classification in equipment condition monitoring
    Yuan, Yage
    Wei, Jianan
    Huang, Haisong
    Jiao, Weidong
    Wang, Jiaxin
    Chen, Hualin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [22] Majority-to-minority resampling for boosting-based classification under imbalanced data
    Gaoshan Wang
    Jian Wang
    Kejing He
    Applied Intelligence, 2023, 53 : 4541 - 4562
  • [23] Knowledge distillation with resampling for imbalanced data classification: Enhancing predictive performance and explainability stability
    Fujiwara, Kazuki
    RESULTS IN ENGINEERING, 2024, 24
  • [24] Majority-to-minority resampling for boosting-based classification under imbalanced data
    Wang, Gaoshan
    Wang, Jian
    He, Kejing
    APPLIED INTELLIGENCE, 2023, 53 (04) : 4541 - 4562
  • [25] Value-Aware Resampling and Loss for Imbalanced Classification
    Sun, Li
    Song, Jie
    Hua, Cheng
    Shen, Chengchao
    Song, Mingli
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
  • [26] Resampling Imbalanced Healthcare Data for Predictive Modelling
    Mamilla, Manoj Yadav
    Al-Haddad, Ronak
    Chowdhury, Stiphen
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (02) : 36 - 44
  • [27] A novel data-driven integrated detection method for network intrusion classification based on multi-feature imbalanced data
    Wang, Chia-Hung
    Ye, Qing
    Cai, Jiongbiao
    Suo, Yifan
    Lin, Shengming
    Yuan, Jinchen
    Wu, Xiaojing
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (03) : 5893 - 5910
  • [28] Addressing Imbalanced Data Problem with Generative Adversarial Network For Intrusion Detection
    Yilmaz, Ibrahim
    Masum, Rahat
    Siraj, Ambareen
    2020 IEEE 21ST INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2020), 2020, : 25 - 30
  • [29] Oversampling for Imbalanced Data Classification Using Adversarial Network
    Lee, Sang-Kwang
    Hong, Seung-Jin
    Yang, Seong-Il
    2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1255 - 1257
  • [30] RBSP-Boosting: A Shapley value-based resampling approach for imbalanced data classification
    Chong, Weitu
    Chen, Ningjiang
    Fang, Chengyun
    INTELLIGENT DATA ANALYSIS, 2022, 26 (06) : 1579 - 1595