An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift

被引:2
|
作者
Liu, Yansong [1 ,2 ]
Wang, Shuang [3 ]
Sui, He [4 ]
Zhu, Li [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian, Shaanxi, Peoples R China
[2] Shandong Management Univ, Sch Intelligent Engn, Jinan, Shandong, Peoples R China
[3] Civil Aviat Univ China, Informat Secur Evaluat Ctr Civil Aviat, Tianjin, Peoples R China
[4] Civil Aviat Univ China, Coll Aeronaut Engn, Tianjin, Peoples R China
来源
PLOS ONE | 2024年 / 19卷 / 01期
关键词
DYNAMIC WEIGHTED MAJORITY;
D O I
10.1371/journal.pone.0292140
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A challenge to many real-world data streams is imbalance with concept drift, which is one of the most critical tasks in anomaly detection. Learning nonstationary data streams for anomaly detection has been well studied in recent years. However, most of the researches assume that the class of data streams is relatively balanced. Only a few approaches tackle the joint issue of imbalance and concept drift. To overcome this joint issue, we propose an ensemble learning method with generative adversarial network-based sampling and consistency check (EGSCC) in this paper. First, we design a comprehensive anomaly detection framework that includes an oversampling module by generative adversarial network, an ensemble classifier, and a consistency check module. Next, we introduce double encoders into GAN to better capture the distribution characteristics of imbalanced data for oversampling. Then, we apply the stacking ensemble learning to deal with concept drift. Four base classifiers of SVM, KNN, DT and RF are used in the first layer, and LR is used as meta classifier in second layer. Last but not least, we take consistency check of the incremental instance and check set to determine whether it is anormal by statistical learning, instead of threshold-based method. And the validation set is dynamic updated according to the consistency check result. Finally, three artificial data sets obtained from Massive Online Analysis platform and two real data sets are used to verify the performance of the proposed method from four aspects: detection performance, parameter sensitivity, algorithm cost and anti-noise ability. Experimental results show that the proposed method has significant advantages in anomaly detection of imbalanced data streams with concept drift.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] GPU-Accelerated Extreme Learning Machines for Imbalanced Data Streams with Concept Drift
    Krawczyk, Bartosz
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 1692 - 1701
  • [32] An Ensemble Classifier Algorithm for Mining data Streams Based on Concept Drift
    Geng, Yushui
    Zhang, Jianguo
    2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2017, : 227 - 230
  • [33] Semi-supervised Ensemble Learning of Data Streams in the Presence of Concept Drift
    Ahmadi, Zahra
    Beigy, Hamid
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, 2012, 7209 : 526 - 537
  • [34] EMRIL: Ensemble Method based on ReInforcement Learning for binary classification in imbalanced drifting data streams
    Usman, Muhammad
    Chen, Huanhuan
    NEUROCOMPUTING, 2024, 605
  • [35] Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning
    Engelmann, Justin
    Lessmann, Stefan
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
  • [36] Concept Drift Detection in Data Stream: Ensemble Learning Method for Detecting Gradual Instances
    Khanh-Tung Nguyen
    Trung Tran
    Anh-Duc Nguyen
    Xuan-Hieu Phan
    Quang-Thuy Ha
    2023 ASIA MEETING ON ENVIRONMENT AND ELECTRICAL ENGINEERING, EEE-AM, 2023,
  • [37] Imbalanced Data Classification Method Based on Ensemble Learning
    Xiang, Yu
    Xie, Yongping
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 18 - 24
  • [38] Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data
    Meenal Jain
    Gagandeep Kaur
    Cluster Computing, 2021, 24 : 2099 - 2114
  • [39] GAN-SR Anomaly Detection Model Based on Imbalanced Data
    Wang, Shuang
    Chen, Hui
    Ding, Lei
    Sui, He
    Ding, Jianli
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (07) : 1209 - 1218
  • [40] Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data
    Jain, Meenal
    Kaur, Gagandeep
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (03): : 2099 - 2114