Concept Drift Class Detection Based on Time Window

被引:0
|
作者
Guo H. [1 ,2 ]
Ren Q. [1 ]
Wang W. [1 ,2 ]
机构
[1] School of Computer and Information Technology, Shanxi University, Taiyuan
[2] Key Laboratory of Computational Intelligence and Chinese Information Processing, Shanxi University, Ministry of Education, Taiyuan
基金
中国国家自然科学基金;
关键词
Concept drift; Concept drift class; Drift span; Streaming data; Time window;
D O I
10.7544/issn1000-1239.20200562
中图分类号
学科分类号
摘要
As a new type of data, streaming data has been applied in various application fields. Its fast, massive and continuous characteristics make single pass and accurate scanning become essential features of online learning. In the process of continuous generation of streaming data, concept drift often occurs. At present, the research on concept drift detection is relatively mature. However, in reality, the development of learning environment factors in different directions often leads to the diversity of concept drift class in streaming data, which brings new challenges to streaming data mining and online learning. To solve this problem, this paper proposes a concept drift class detection method based on time window (CD-TW). In this method, stack and queue are used to access the data, and window mechanism is used to learn streaming data in chunks. This method detects concept drift site by creating two basic site time windows which load historical data and current data respectively and comparing the distribution changes of the data contained in them. Then, a span time window loading partial data after drift site is created. The drift span is obtained by analyzing the distribution stability of the data in span time window, which is further used to judge the concept drift class. The results of experiment demonstrate that CD-TW can not only detect concept drift site accurately, but also show good performance in judging the class of concept drift. © 2022, Science Press. All right reserved.
引用
收藏
页码:127 / 143
页数:16
相关论文
共 29 条
  • [1] Ciaramita M, Murdock V, Plachouras V., Online learning from click data for sponsored search, Proc of the 17th Int Conf on World Wide Web (WWW2008), pp. 227-236, (2008)
  • [2] Ma J, Saul L K, Savage S, Et al., Identifying suspicious URLs: An application of large-scale online learning, Proc of the 26th Annual Int Conf on Machine Learning, pp. 681-688, (2009)
  • [3] Delany S J, Cunningham P, Tsymbal A, Et al., A case-based technique for tracking concept drift in spam filtering, Knowledge Based Systems, 18, 4, pp. 187-195, (2005)
  • [4] Gaber M M., Advances in data stream mining, Wiley Interdisciplinary Reviews: Data Mining & Knowledge Discovery, 2, 1, pp. 79-85, (2012)
  • [5] Zhai Tingting, Gao Yang, Zhu Junwu, Survey of online learning algorithms for streaming data classification, Journal of Software, 31, 4, pp. 912-931, (2020)
  • [6] Tennant M, Stahl F T, Rana O F, Et al., Scalable real-time classification of data streams with concept drift, Future Generation Computer Systems, 75, pp. 187-199, (2017)
  • [7] Du Hangyuan, Wang Wenjian, Bai Liang, A novel evolving data stream clustering method based on optimization model, SCIENTIA SINICA: Informationis, 47, 11, pp. 1464-1482, (2017)
  • [8] Krempl G, Zliobaite I, Brzezinski D, Et al., Open challenges for data stream mining research, ACM SIGKDD Explorations Newsletter, 16, 1, pp. 1-10, (2014)
  • [9] Ditzler G, Polikar R., Incremental learning of concept drift from streaming imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 25, 10, pp. 2283-2301, (2013)
  • [10] Minku L L, Yao Xin, DDD: A new ensemble approach for dealing with concept drift, IEEE Transactions on Knowledge and Data Engineering, 24, 4, pp. 619-633, (2012)