Feature Selection For Machine Learning-Based Early Detection of Distributed Cyber Attacks

被引:19
|
作者
Feng, Yaokai [1 ]
Akiyama, Hitoshi [2 ]
Lu, Liang [2 ,4 ]
Sakurai, Kouichi [3 ]
机构
[1] Kyushu Univ, Fac Adv Informat Technol, Fukuoka, Fukuoka, Japan
[2] Kyushu Univ, Dept Informat, Fukuoka, Fukuoka, Japan
[3] Kyushu Univ, Fac Informat, Fukuoka, Fukuoka, Japan
[4] Fujitsu Co Ltd, Fukuoka, Fukuoka, Japan
基金
日本科学技术振兴机构;
关键词
distributed cyber attacks; DDoS attacks; machine learning; feature selection; early detection; CLASSIFICATION;
D O I
10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well known that distributed cyber attacks simultaneously launched from many hosts have caused the most serious problems in recent years including problems of privacy leakage and denial of services. Thus, how to detect those attacks at early stage has become an important and urgent topic in the cyber security community. For this purpose, recognizing C&C (Command & Control) communication between compromised bots and the C&C server becomes a crucially important issue, because C&C communication is in the preparation phase of distributed attacks. Although attack detection based on signature has been practically applied since long ago, it is well-known that it cannot efficiently deal with new kinds of attacks. In recent years, ML(Machine learning)-based detection methods have been studied widely. In those methods, feature selection is obviously very important to the detection performance. We once utilized up to 55 features to pick out C&C traffic in order to accomplish early detection of DDoS attacks. In this work, we try to answer the question that "Are all of those features really necessary?" We mainly investigate how the detection performance moves as the features are removed from those having lowest importance and we try to make it clear that what features should be payed attention for early detection of distributed attacks. We use honeypot data collected during the period from 2008 to 2013. SVM(Support Vector Machine) and PCA(Principal Component Analysis) are utilized for feature selection and SVM and RF(Random Forest) are for building the classifier. We find that the detection performance is generally getting better if more features are utilized. However, after the number of features has reached around 40, the detection performance will not change much even more features are used. It is also verified that, in some specific cases, more features do not always means a better detection performance. We also discuss 10 important features which have the biggest influence on classification.
引用
收藏
页码:173 / 180
页数:8
相关论文
共 50 条
  • [41] Data Curation and Quality Evaluation for Machine Learning-Based Cyber Intrusion Detection
    Tran, Ngan
    Chen, Haihua
    Bhuyan, Jay
    Ding, Junhua
    IEEE ACCESS, 2022, 10 : 121900 - 121923
  • [42] Feature extraction for machine learning-based intrusion detection in IoT networks
    Sarhan, Mohanad
    Layeghy, Siamak
    Moustafa, Nour
    Gallagher, Marcus
    Portmann, Marius
    DIGITAL COMMUNICATIONS AND NETWORKS, 2024, 10 (01) : 205 - 216
  • [43] Feature extraction for machine learning-based intrusion detection in IoT networks
    Mohanad Sarhan
    Siamak Layeghy
    Nour Moustafa
    Marcus Gallagher
    Marius Portmann
    Digital Communications and Networks, 2024, 10 (01) : 205 - 216
  • [44] Impact of Feature Normalization on Machine Learning-Based Human Fall Detection
    Fayad, Moustafa
    Hachani, Mohamed-Yacine
    Mostefaoui, Ahmed
    Merzoug, Mohammed Amine
    Lajoie, Isabelle
    Yahiaoui, Reda
    MANAGEMENT OF DIGITAL ECOSYSTEMS, MEDES 2023, 2024, 2022 : 147 - 161
  • [45] An Efficient Machine Learning-Based Feature Optimization Model for the Detection of Dyslexia
    Ahmad, Nazir
    Rehman, Mohammed Burhanur
    El Hassan, Hatim Mohammed
    Ahmad, Iqrar
    Rashid, Mamoon
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [46] Hybrid Cyber-Security Model for Attacks Detection Based on Deep and Machine Learning
    Naser, Shaymaa Mahmood
    Ali, Yossra Hussain
    Obe, Dhiya Al-Jumeily
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2022, 18 (11) : 17 - 30
  • [47] Explaining Machine Learning-Based Feature Selection of IDS for IoT and CPS Devices
    Akintade, Sesan
    Kim, Seongtae
    Roy, Kaushik
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2023, PT II, 2023, 676 : 69 - 80
  • [48] A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction
    Pudjihartono, Nicholas
    Fadason, Tayaza
    Kempa-Liehr, Andreas W.
    O'Sullivan, Justin M.
    FRONTIERS IN BIOINFORMATICS, 2022, 2
  • [49] Incorporating feature selection methods into a machine learning-based neonatal seizure diagnosis
    Acikoglu, Merve
    Tuncer, Seda Arslan
    MEDICAL HYPOTHESES, 2020, 135
  • [50] Machine Learning-Based Feature Selection and Classification for the Experimental Diagnosis of Trypanosoma cruzi
    Hevia-Montiel, Nidiyare
    Perez-Gonzalez, Jorge
    Neme, Antonio
    Haro, Paulina
    ELECTRONICS, 2022, 11 (05)