Iterative Subset Selection for Feature Drifting Data Streams

被引:8
|
作者
Yuan, Lanqin [1 ]
Pfahringer, Bernhard [2 ]
Barddal, Jean Paul [3 ]
机构
[1] Univ Waikato, Hamilton, New Zealand
[2] Univ Auckland, Deparment Comp Sci, Auckland, New Zealand
[3] Pontificia Univ Catolica Parana, Programa Posgrad Informat, Curitiba, Parana, Brazil
关键词
Data Stream Mining; Feature Selection; Concept Drift; Embedded Feature Selection; Iterative Subset Selection;
D O I
10.1145/3167132.3167188
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Feature selection has been studied and shown to improve classifier performance in standard batch data mining but is mostly unexplored in data stream mining. Feature selection becomes even more important when the relevant subset of features changes over time, as the underlying concept of a data stream drifts. This specific kind of drift is known as feature drift and requires specific techniques not only to determine which features are the most important but also to take advantage of them. This paper presents a novel method of feature subset selection specialized for dealing with the occurrence of feature drifts called Iterative Subset Selection (ISS), which splits the feature selection process into two stages by first ranking the features, and then iteratively selecting features from the ranking. Applying our feature selection method together with Naive Bayes or k-Nearest Neighbour as a classifier, results in compelling accuracy improvements, compared to prior work.
引用
收藏
页码:510 / 517
页数:8
相关论文
共 50 条
  • [31] On the utility of incremental feature selection for the classification of textual data streams
    Katakis, L
    Tsoumakas, G
    Vlahavas, L
    ADVANCES IN INFORMATICS, PROCEEDINGS, 2005, 3746 : 338 - 348
  • [32] Fusion Feature Selection: New Insights into Feature Subset Detection in Biological Data Mining
    Athilakshmi, Rajangam
    Rajavel, Ramadoss
    Jacob, Shomona Gracia
    STUDIES IN INFORMATICS AND CONTROL, 2019, 28 (03): : 327 - 336
  • [33] Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets
    Mamitsuka, H
    KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (01) : 91 - 108
  • [34] Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets
    Hiroshi Mamitsuka
    Knowledge and Information Systems, 2006, 9 : 91 - 108
  • [35] Droplet Ensemble Learning on Drifting Data Streams
    Loeffel, Pierre-Xavier
    Bifet, Albert
    Marsala, Christophe
    Detyniecki, Marcin
    ADVANCES IN INTELLIGENT DATA ANALYSIS XVI, IDA 2017, 2017, 10584 : 210 - 222
  • [36] LUNAR: Cellular automata for drifting data streams
    Lobo, Jesus L.
    Del Ser, Javier
    Herrera, Francisco
    INFORMATION SCIENCES, 2021, 543 : 467 - 487
  • [37] Knowledge maintenance on data streams with concept drifting
    Natwichai, J
    Li, X
    COMPUTATIONAL AND INFORMATION SCIENCE, PROCEEDINGS, 2004, 3314 : 705 - 710
  • [38] Online Active Learning for Drifting Data Streams
    Liu, Sanmin
    Xue, Shan
    Wu, Jia
    Zhou, Chuan
    Yang, Jian
    Li, Zhao
    Cao, Jie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) : 186 - 200
  • [39] Feature Subset Selection within a Simulated Annealing Data Mining Algorithm
    Debuse J.C.W.
    Rayward-Smith V.J.
    Journal of Intelligent Information Systems, 1997, 9 (1) : 57 - 81
  • [40] Parallel fractional dominance MOEAs for feature subset selection in big data
    Vivek, Yelleti
    Ravi, Vadlamani
    Suganthan, Ponnuthurai Nagaratnam
    Krishna, P. Radha
    SWARM AND EVOLUTIONARY COMPUTATION, 2024, 91