Feature Selection For Machine Learning-Based Early Detection of Distributed Cyber Attacks

被引:19
|
作者
Feng, Yaokai [1 ]
Akiyama, Hitoshi [2 ]
Lu, Liang [2 ,4 ]
Sakurai, Kouichi [3 ]
机构
[1] Kyushu Univ, Fac Adv Informat Technol, Fukuoka, Fukuoka, Japan
[2] Kyushu Univ, Dept Informat, Fukuoka, Fukuoka, Japan
[3] Kyushu Univ, Fac Informat, Fukuoka, Fukuoka, Japan
[4] Fujitsu Co Ltd, Fukuoka, Fukuoka, Japan
基金
日本科学技术振兴机构;
关键词
distributed cyber attacks; DDoS attacks; machine learning; feature selection; early detection; CLASSIFICATION;
D O I
10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well known that distributed cyber attacks simultaneously launched from many hosts have caused the most serious problems in recent years including problems of privacy leakage and denial of services. Thus, how to detect those attacks at early stage has become an important and urgent topic in the cyber security community. For this purpose, recognizing C&C (Command & Control) communication between compromised bots and the C&C server becomes a crucially important issue, because C&C communication is in the preparation phase of distributed attacks. Although attack detection based on signature has been practically applied since long ago, it is well-known that it cannot efficiently deal with new kinds of attacks. In recent years, ML(Machine learning)-based detection methods have been studied widely. In those methods, feature selection is obviously very important to the detection performance. We once utilized up to 55 features to pick out C&C traffic in order to accomplish early detection of DDoS attacks. In this work, we try to answer the question that "Are all of those features really necessary?" We mainly investigate how the detection performance moves as the features are removed from those having lowest importance and we try to make it clear that what features should be payed attention for early detection of distributed attacks. We use honeypot data collected during the period from 2008 to 2013. SVM(Support Vector Machine) and PCA(Principal Component Analysis) are utilized for feature selection and SVM and RF(Random Forest) are for building the classifier. We find that the detection performance is generally getting better if more features are utilized. However, after the number of features has reached around 40, the detection performance will not change much even more features are used. It is also verified that, in some specific cases, more features do not always means a better detection performance. We also discuss 10 important features which have the biggest influence on classification.
引用
收藏
页码:173 / 180
页数:8
相关论文
共 50 条
  • [21] Enhancing IoT Botnet Detection through Machine Learning-based Feature Selection and Ensemble Models
    Sharma, Ravi
    Din, Saika Mohi Ud
    Sharma, Nonita
    Kumar, Arun
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (02) : 1 - 6
  • [22] Adversarial attacks on machine learning-based cyber security systems: a survey of techniques and defences
    Patel, Pratik S.
    Panchal, Pooja
    INTERNATIONAL JOURNAL OF ELECTRONIC SECURITY AND DIGITAL FORENSICS, 2025, 17 (1-2)
  • [23] Security of Machine Learning-Based Anomaly Detection in Cyber Physical Systems
    Jadidi, Zahra
    Pal, Shantanu
    Nayak, Nithesh K.
    Selvakkumar, Arawinkumaar
    Chang, Chih-Chia
    Beheshti, Maedeh
    Jolfaei, Alireza
    2022 31ST INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2022), 2022,
  • [24] Machine Learning-Based Cyber-Attack Detection in Photovoltaic Farms
    Zhang, Jinan
    Guo, Lulu
    Ye, Jin
    Giani, Annarita
    Elasser, Ahmed
    Song, Wenzhan
    Liu, Jianzhe
    Chen, Bo
    Mantooth, H. Alan
    IEEE OPEN JOURNAL OF POWER ELECTRONICS, 2023, 4 : 658 - 673
  • [25] Sparse Kernel Learning-Based Feature Selection for Anomaly Detection
    Peng, Zhimin
    Gurram, Prudhvi
    Kwon, Heesung
    Yin, Wotao
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2015, 51 (03) : 1698 - 1716
  • [26] Ensemble learning-based feature selection for phosphorylation site detection
    Liu, Songbo
    Cui, Chengmin
    Chen, Huipeng
    Liu, Tong
    FRONTIERS IN GENETICS, 2022, 13
  • [27] An Ensemble Learning-Based Cyber-Attacks Detection Method of Cyber-Physical Power Systems
    Lu, Kang-Di
    Wu, Zheng-Guang
    2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 1029 - 1034
  • [28] Learning-Based Attacks in Cyber-Physical Systems
    Khojasteh, Mohammad Javad
    Khina, Anatoly
    Franceschetti, Massimo
    Javidi, Tara
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2021, 8 (01): : 437 - 449
  • [29] Learning-based attacks in cyber-physical systems
    Khojasteh, Mohammad Javad
    Khina, Anatoly
    Franceschetti, Massimo
    Javidi, Tara
    IEEE Transactions on Control of Network Systems, 2021, 8 (01): : 437 - 449
  • [30] Detection of power grid disturbances and cyber-attacks based on machine learning
    Wang, Defu
    Wang, Xiaojuan
    Zhang, Yong
    Jin, Lei
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2019, 46 : 42 - 52