Resampling Effects on Imbalanced Data in Network Intrusion Classification

被引：0

作者：

Rahma, Fayruz ^{[1
]}

Rajasa, Mahesa Cadi ^{[2
]}

Rachmadi, Reza Fuad ^{[3
]}

Pratomo, Baskoro Adi ^{[4
]}

Purnomo, Mauridhi Hery ^{[3
]}

机构：

[1] Inst Teknol Sepuluh Nopember, Dept Elect Engn, Surabaya, Indonesia

[2] Univ Islam Indonesia, Dept Informat, Yogyakarta, Indonesia

[3] Inst Teknol Sepuluh Nopember, Dept Comp Engn, Dept Elect Engn, Surabaya, Indonesia

[4] Inst Teknol Sepuluh Nopember, Dept Informat, Surabaya, Indonesia

来源：

2024 INTERNATIONAL ELECTRONICS SYMPOSIUM, IES 2024 | 2024年

关键词：

imbalance data; network intrusion detection; resampling techniques;

D O I：

10.1109/IES63037.2024.10665861

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The rapid expansion of network connections has significantly increased network traffic activity, introducing new cybersecurity challenges and heightened vulnerability to cyber attacks. To address these challenges, researchers have leveraged intelligent techniques such as machine learning (ML) to enhance attack detection accuracy in network traffic. However, ML models often face a data imbalance issue in their training sets. This imbalance, typically due to the uneven distribution of attack classes, hampers the classification performance of ML models in network intrusion detection. To mitigate class imbalance, various resampling techniques can be employed. This study evaluates several resampling techniques, including Random Oversampling, SMOTE, ADASYN, Random Undersampling, Tomek Links, and SMOTE-Tomek. Using the UNSW-NB15 dataset, we trained and tested ML models, including Decision Tree, Random Forest, Gradient Boosting, XGBoost, and 1D-CNN algorithms. Our analysis demonstrates that resampling techniques significantly impact the performance of machine learning models. The Tomek Links technique applied to the 1D-CNN model achieved the highest performance, with an accuracy of 75.27%, a precision of 87.58%, and an F1-score of 76.22%. Notably, the best recall score of 67.57% was obtained from the 1D-CNN model without resampling. These findings provide valuable insights for researchers and engineers, aiding in selecting appropriate resampling techniques for developing robust detection models for network traffic attacks.

引用

页码：534 / 540

页数：7

共 50 条

[21] Review of resampling techniques for the treatment of imbalanced industrial data classification in equipment condition monitoring
Yuan, Yage
Wei, Jianan
Huang, Haisong
Jiao, Weidong
Wang, Jiaxin
Chen, Hualin
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[22] Majority-to-minority resampling for boosting-based classification under imbalanced data
Gaoshan Wang
Jian Wang
Kejing He
Applied Intelligence, 2023, 53 : 4541 - 4562
[23] Knowledge distillation with resampling for imbalanced data classification: Enhancing predictive performance and explainability stability
Fujiwara, Kazuki
RESULTS IN ENGINEERING, 2024, 24
[24] Majority-to-minority resampling for boosting-based classification under imbalanced data
Wang, Gaoshan
Wang, Jian
He, Kejing
APPLIED INTELLIGENCE, 2023, 53 (04) : 4541 - 4562
[25] Value-Aware Resampling and Loss for Imbalanced Classification
Sun, Li
Song, Jie
Hua, Cheng
Shen, Chengchao
Song, Mingli
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
[26] Resampling Imbalanced Healthcare Data for Predictive Modelling
Mamilla, Manoj Yadav
Al-Haddad, Ronak
Chowdhury, Stiphen
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (02) : 36 - 44
[27] A novel data-driven integrated detection method for network intrusion classification based on multi-feature imbalanced data
Wang, Chia-Hung
Ye, Qing
Cai, Jiongbiao
Suo, Yifan
Lin, Shengming
Yuan, Jinchen
Wu, Xiaojing
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (03) : 5893 - 5910
[28] Addressing Imbalanced Data Problem with Generative Adversarial Network For Intrusion Detection
Yilmaz, Ibrahim
Masum, Rahat
Siraj, Ambareen
2020 IEEE 21ST INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2020), 2020, : 25 - 30
[29] Oversampling for Imbalanced Data Classification Using Adversarial Network
Lee, Sang-Kwang
Hong, Seung-Jin
Yang, Seong-Il
2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1255 - 1257
[30] RBSP-Boosting: A Shapley value-based resampling approach for imbalanced data classification
Chong, Weitu
Chen, Ningjiang
Fang, Chengyun
INTELLIGENT DATA ANALYSIS, 2022, 26 (06) : 1579 - 1595

← 1 2 3 4 5 →