Resampling Effects on Imbalanced Data in Network Intrusion Classification

被引:0
|
作者
Rahma, Fayruz [1 ]
Rajasa, Mahesa Cadi [2 ]
Rachmadi, Reza Fuad [3 ]
Pratomo, Baskoro Adi [4 ]
Purnomo, Mauridhi Hery [3 ]
机构
[1] Inst Teknol Sepuluh Nopember, Dept Elect Engn, Surabaya, Indonesia
[2] Univ Islam Indonesia, Dept Informat, Yogyakarta, Indonesia
[3] Inst Teknol Sepuluh Nopember, Dept Comp Engn, Dept Elect Engn, Surabaya, Indonesia
[4] Inst Teknol Sepuluh Nopember, Dept Informat, Surabaya, Indonesia
来源
2024 INTERNATIONAL ELECTRONICS SYMPOSIUM, IES 2024 | 2024年
关键词
imbalance data; network intrusion detection; resampling techniques;
D O I
10.1109/IES63037.2024.10665861
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The rapid expansion of network connections has significantly increased network traffic activity, introducing new cybersecurity challenges and heightened vulnerability to cyber attacks. To address these challenges, researchers have leveraged intelligent techniques such as machine learning (ML) to enhance attack detection accuracy in network traffic. However, ML models often face a data imbalance issue in their training sets. This imbalance, typically due to the uneven distribution of attack classes, hampers the classification performance of ML models in network intrusion detection. To mitigate class imbalance, various resampling techniques can be employed. This study evaluates several resampling techniques, including Random Oversampling, SMOTE, ADASYN, Random Undersampling, Tomek Links, and SMOTE-Tomek. Using the UNSW-NB15 dataset, we trained and tested ML models, including Decision Tree, Random Forest, Gradient Boosting, XGBoost, and 1D-CNN algorithms. Our analysis demonstrates that resampling techniques significantly impact the performance of machine learning models. The Tomek Links technique applied to the 1D-CNN model achieved the highest performance, with an accuracy of 75.27%, a precision of 87.58%, and an F1-score of 76.22%. Notably, the best recall score of 67.57% was obtained from the 1D-CNN model without resampling. These findings provide valuable insights for researchers and engineers, aiding in selecting appropriate resampling techniques for developing robust detection models for network traffic attacks.
引用
收藏
页码:534 / 540
页数:7
相关论文
共 50 条
  • [1] Resampling imbalanced data for network intrusion detection datasets
    Bagui, Sikha
    Li, Kunqi
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [2] Resampling imbalanced data for network intrusion detection datasets
    Sikha Bagui
    Kunqi Li
    Journal of Big Data, 8
  • [3] Resampling Imbalanced Network Intrusion Datasets to Identify Rare Attacks
    Bagui, Sikha
    Mink, Dustin
    Bagui, Subhash
    Subramaniam, Sakthivel
    Wallace, Daniel
    FUTURE INTERNET, 2023, 15 (04)
  • [4] A Combination of Resampling and Ensemble Method for Text Classification on Imbalanced Data
    Feng, Haijun
    Qin, Wen
    Wang, Huijing
    Li, Yi
    Hu, Guangwu
    BIG DATA, BIGDATA 2021, 2022, 12988 : 3 - 16
  • [5] Imbalanced Data Classification Based on a Hybrid Resampling SVM Method
    Cao, Lu
    Zhai, Yikui
    IEEE 12TH INT CONF UBIQUITOUS INTELLIGENCE & COMP/IEEE 12TH INT CONF ADV & TRUSTED COMP/IEEE 15TH INT CONF SCALABLE COMP & COMMUN/IEEE INT CONF CLOUD & BIG DATA COMP/IEEE INT CONF INTERNET PEOPLE AND ASSOCIATED SYMPOSIA/WORKSHOPS, 2015, : 1533 - 1536
  • [6] CCR: A COMBINED CLEANING AND RESAMPLING ALGORITHM FOR IMBALANCED DATA CLASSIFICATION
    Koziarski, Michal
    Wozniak, Michal
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2017, 27 (04) : 727 - 736
  • [7] Influence of Resampling on Accuracy of Imbalanced Classification
    Burnaev, E.
    Erofeev, P.
    Papanov, A.
    EIGHTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2015), 2015, 9875
  • [8] SPY: a novel resampling method for improving classification performance in imbalanced data
    Xuan Tho Dang
    Dang Hung Tran
    Hirose, Osamu
    Satou, Kenji
    2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 280 - 285
  • [9] Highly Imbalanced Classification of Gout Using Data Resampling and Ensemble Method
    Si, Xiaonan
    Wang, Lei
    Xu, Wenchang
    Wang, Biao
    Cheng, Wenbo
    ALGORITHMS, 2024, 17 (03)
  • [10] Toward hierarchical classification of imbalanced data using random resampling algorithms
    Pereira, Rodolfo M.
    Costa, Yandre M. G.
    Silla, Carlos N., Jr.
    INFORMATION SCIENCES, 2021, 578 : 344 - 363