Iterative Subset Selection for Feature Drifting Data Streams

被引:8
|
作者
Yuan, Lanqin [1 ]
Pfahringer, Bernhard [2 ]
Barddal, Jean Paul [3 ]
机构
[1] Univ Waikato, Hamilton, New Zealand
[2] Univ Auckland, Deparment Comp Sci, Auckland, New Zealand
[3] Pontificia Univ Catolica Parana, Programa Posgrad Informat, Curitiba, Parana, Brazil
关键词
Data Stream Mining; Feature Selection; Concept Drift; Embedded Feature Selection; Iterative Subset Selection;
D O I
10.1145/3167132.3167188
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Feature selection has been studied and shown to improve classifier performance in standard batch data mining but is mostly unexplored in data stream mining. Feature selection becomes even more important when the relevant subset of features changes over time, as the underlying concept of a data stream drifts. This specific kind of drift is known as feature drift and requires specific techniques not only to determine which features are the most important but also to take advantage of them. This paper presents a novel method of feature subset selection specialized for dealing with the occurrence of feature drifts called Iterative Subset Selection (ISS), which splits the feature selection process into two stages by first ranking the features, and then iteratively selecting features from the ranking. Applying our feature selection method together with Naive Bayes or k-Nearest Neighbour as a classifier, results in compelling accuracy improvements, compared to prior work.
引用
收藏
页码:510 / 517
页数:8
相关论文
共 50 条
  • [1] Addressing Feature Drift in Data Streams Using Iterative Subset Selection
    Yuan, Lanqin
    Pfahringer, Bernhard
    Barddal, Jean Paul
    APPLIED COMPUTING REVIEW, 2019, 19 (01): : 20 - 33
  • [2] Feature subset selection for data and feature streams: a review
    Carlos Villa-Blanco
    Concha Bielza
    Pedro Larrañaga
    Artificial Intelligence Review, 2023, 56 : 1011 - 1062
  • [3] Feature subset selection for data and feature streams: a review
    Villa-Blanco, Carlos
    Bielza, Concha
    Larranaga, Pedro
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL 1) : 1011 - 1062
  • [4] A Benchmark of Classifiers on Feature Drifting Data Streams
    Barddal, Jean Paul
    Gomes, Heitor Murilo
    Britto, Alceu de Souza, Jr.
    Enembreck, Fabricio
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2180 - 2185
  • [5] Online feature subset selection for mining feature streams in big data via incremental learning and evolutionary computation
    Vivek, Yelleti
    Ravi, Vadlamani
    Krishna, P. Radha
    SWARM AND EVOLUTIONARY COMPUTATION, 2025, 94
  • [6] Fair and Representative Subset Selection from Data Streams
    Wang, Yanhao
    Fabbri, Francesco
    Mathioudakis, Michael
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1340 - 1350
  • [7] Feature subset selection with applications to hyperspectral data
    Chen, H
    Varshney, PK
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 249 - 252
  • [8] A Model-Selection Framework for Concept-Drifting Data Streams
    Chen, Bo-Heng
    Chuang, Kun-Ta
    2014 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2014, : 290 - 296
  • [9] Decision tree-based Feature Ranking in Concept Drifting Data Streams
    Pereira Karax, Jean Antonio
    Malucelli, Andreia
    Barddal, Jean Paul
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 590 - 592
  • [10] A conservative feature subset selection algorithm with missing data
    Aussem, Alex
    de Morais, Sergio Rodrigues
    NEUROCOMPUTING, 2010, 73 (4-6) : 585 - 590