A novel semi-supervised approach for network traffic clustering

被引:29
|
作者
Wang Y. [1 ]
Xiang Y. [1 ]
Zhang J. [1 ]
Yu S. [2 ]
机构
[1] School of Information Technology, Deakin University, Melbourne
[2] Department of Electronic and Communication Engineering, Sun Yat-Sen University, Guangzhou
关键词
constrained clustering; constraints; machine learning; semi-supervised learning; traffic classification;
D O I
10.1109/ICNSS.2011.6059997
中图分类号
学科分类号
摘要
Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous papers have illustrated that clustering can achieve high accuracy and discover unknown application classes. In this work, we present a novel semi-supervised learning method using constrained clustering algorithms. The motivation is that in network domain a lot of background information is available in addition to the data instances themselves. For example, we might know that flow f1 and f2 are using the same application protocol because they are visiting the same host address at the same port simultaneously. In this case, f1 and f2 shall be grouped into the same cluster ideally. Therefore, we describe these correlations in the form of pair-wise must-link constraints and incorporate them in the process of clustering. We have applied three constrained variants of the K-Means algorithm, which perform hard or soft constraint satisfaction and metric learning from constraints. A number of real-world traffic traces have been used to show the availability of constraints and to test the proposed approach. The experimental results indicate that by incorporating constraints in the course of clustering, the overall accuracy and cluster purity can be significantly improved. © 2011 IEEE.
引用
收藏
页码:169 / 175
页数:6
相关论文
共 50 条
  • [31] A novel semi-supervised approach for feature extraction
    Qiu, Junyang
    Zhang, Yanyan
    Pan, Zhisong
    Yang, Haimin
    Ren, Huifeng
    Li, Xin
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3765 - 3770
  • [32] Overlapping coefficient in network-based semi-supervised clustering
    Conversano, Claudio
    Frigau, Luca
    Contu, Giulia
    COMPUTATIONAL STATISTICS, 2024, 39 (07) : 3831 - 3854
  • [33] Network traffic classification based on federated semi-supervised learning
    Wang, Zixuan
    Li, Zeyi
    Fu, Mengyi
    Ye, Yingchun
    Wang, Pan
    JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 149
  • [34] A semi-supervised approach to projected clustering with applications to microarray data
    Yip, Kevin Y.
    Cheung, Lin
    Cheung, David W.
    Jing, Liping
    Ng, Michael K.
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2009, 3 (03) : 229 - 259
  • [35] A Graph-Based Projection Approach for Semi-supervised Clustering
    Yoshida, Tetsuya
    Okatani, Kazuhiro
    KNOWLEDGE MANAGEMENT AND ACQUISITION FOR SMART SYSTEMS AND SERVICES, 2010, 6232 : 1 - 13
  • [36] A Novel Network Intrusion Detection System Based on Semi-Supervised Approach for IoT
    Bhavani, A. Durga
    Mangla, Neha
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (04) : 207 - 216
  • [37] A HYBRID APPROACH TO SELECTING INFORMATIVE CONSTRAINTS FOR SEMI-SUPERVISED CLUSTERING
    Ni, Xianhua
    Yang, Yan
    UNCERTAINTY MODELING IN KNOWLEDGE ENGINEERING AND DECISION MAKING, 2012, 7 : 833 - 838
  • [38] A genetic semi-supervised fuzzy clustering approach to text classification
    Liu, H
    Huang, ST
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2003, 2762 : 173 - 180
  • [39] A New Approach for Semi-supervised Fuzzy Clustering with Multiple Fuzzifiers
    Tran Manh Tuan
    Mai Dinh Sinh
    Tran Đinh Khang
    Phung The Huan
    Tran Thi Ngan
    Nguyen Long Giang
    Vu Duc Thai
    International Journal of Fuzzy Systems, 2022, 24 : 3688 - 3701
  • [40] TESC: An approach to TExt classification using Semi-supervised Clustering
    Zhang, Wen
    Tang, Xijin
    Yoshida, Taketoshi
    KNOWLEDGE-BASED SYSTEMS, 2015, 75 : 152 - 160