An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift

被引:2
|
作者
Liu, Yansong [1 ,2 ]
Wang, Shuang [3 ]
Sui, He [4 ]
Zhu, Li [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian, Shaanxi, Peoples R China
[2] Shandong Management Univ, Sch Intelligent Engn, Jinan, Shandong, Peoples R China
[3] Civil Aviat Univ China, Informat Secur Evaluat Ctr Civil Aviat, Tianjin, Peoples R China
[4] Civil Aviat Univ China, Coll Aeronaut Engn, Tianjin, Peoples R China
来源
PLOS ONE | 2024年 / 19卷 / 01期
关键词
DYNAMIC WEIGHTED MAJORITY;
D O I
10.1371/journal.pone.0292140
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A challenge to many real-world data streams is imbalance with concept drift, which is one of the most critical tasks in anomaly detection. Learning nonstationary data streams for anomaly detection has been well studied in recent years. However, most of the researches assume that the class of data streams is relatively balanced. Only a few approaches tackle the joint issue of imbalance and concept drift. To overcome this joint issue, we propose an ensemble learning method with generative adversarial network-based sampling and consistency check (EGSCC) in this paper. First, we design a comprehensive anomaly detection framework that includes an oversampling module by generative adversarial network, an ensemble classifier, and a consistency check module. Next, we introduce double encoders into GAN to better capture the distribution characteristics of imbalanced data for oversampling. Then, we apply the stacking ensemble learning to deal with concept drift. Four base classifiers of SVM, KNN, DT and RF are used in the first layer, and LR is used as meta classifier in second layer. Last but not least, we take consistency check of the incremental instance and check set to determine whether it is anormal by statistical learning, instead of threshold-based method. And the validation set is dynamic updated according to the consistency check result. Finally, three artificial data sets obtained from Massive Online Analysis platform and two real data sets are used to verify the performance of the proposed method from four aspects: detection performance, parameter sensitivity, algorithm cost and anti-noise ability. Experimental results show that the proposed method has significant advantages in anomaly detection of imbalanced data streams with concept drift.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Dynamic Ensemble Selection for Imbalanced Data Streams With Concept Drift
    Jiao, Botao
    Guo, Yinan
    Gong, Dunwei
    Chen, Qiuju
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (01) : 1278 - 1291
  • [2] Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm
    Li, Zeng
    Huang, Wenchao
    Xiong, Yan
    Ren, Siqi
    Zhu, Tuanfei
    KNOWLEDGE-BASED SYSTEMS, 2020, 195
  • [3] The Gradual Resampling Ensemble for mining imbalanced data streams with concept drift
    Ren, Siqi
    Liao, Bo
    Zhu, Wen
    Li, Zeng
    Liu, Wei
    Li, Keqin
    NEUROCOMPUTING, 2018, 286 : 150 - 166
  • [4] A comprehensive active learning method for multiclass imbalanced data streams with concept drift
    Liu, Weike
    Zhang, Hang
    Ding, Zhaoyun
    Liu, Qingbao
    Zhu, Cheng
    KNOWLEDGE-BASED SYSTEMS, 2021, 215
  • [5] An Adaptive Active Learning Method for Multiclass Imbalanced Data Streams with Concept Drift
    Han, Meng
    Li, Chunpeng
    Meng, Fanxing
    He, Feifei
    Zhang, Ruihua
    APPLIED SCIENCES-BASEL, 2024, 14 (16):
  • [6] Cost-sensitive continuous ensemble kernel learning for imbalanced data streams with concept drift
    Chen, Yingying
    Yang, Xiaowei
    Dai, Hong-Liang
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [7] A novel concept drift detection method in data streams using ensemble classifiers
    Dehghan, Mahdie
    Beigy, Hamid
    ZareMoodi, Poorya
    INTELLIGENT DATA ANALYSIS, 2016, 20 (06) : 1329 - 1350
  • [8] A dynamic ensemble algorithm for anomaly detection in IoT imbalanced data streams
    Jiang, Jun
    Liu, Fagui
    Liu, Yongheng
    Tang, Quan
    Wang, Bin
    Zhong, Guoxiang
    Wang, Weizheng
    COMPUTER COMMUNICATIONS, 2022, 194 : 250 - 257
  • [9] A Multiscale Concept Drift Detection Method for Learning from Data Streams
    Wang, XueSong
    Kang, Qi
    Zhou, MengChu
    Yao, SiYa
    2018 IEEE 14TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2018, : 786 - 790
  • [10] A GAN-based hybrid sampling method for imbalanced customer classification
    Zhu, Bing
    Pan, Xin
    vanden Broucke, Seppe
    Xiao, Jin
    INFORMATION SCIENCES, 2022, 609 : 1397 - 1411