Iterative Subset Selection for Feature Drifting Data Streams

被引:8
|
作者
Yuan, Lanqin [1 ]
Pfahringer, Bernhard [2 ]
Barddal, Jean Paul [3 ]
机构
[1] Univ Waikato, Hamilton, New Zealand
[2] Univ Auckland, Deparment Comp Sci, Auckland, New Zealand
[3] Pontificia Univ Catolica Parana, Programa Posgrad Informat, Curitiba, Parana, Brazil
关键词
Data Stream Mining; Feature Selection; Concept Drift; Embedded Feature Selection; Iterative Subset Selection;
D O I
10.1145/3167132.3167188
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Feature selection has been studied and shown to improve classifier performance in standard batch data mining but is mostly unexplored in data stream mining. Feature selection becomes even more important when the relevant subset of features changes over time, as the underlying concept of a data stream drifts. This specific kind of drift is known as feature drift and requires specific techniques not only to determine which features are the most important but also to take advantage of them. This paper presents a novel method of feature subset selection specialized for dealing with the occurrence of feature drifts called Iterative Subset Selection (ISS), which splits the feature selection process into two stages by first ranking the features, and then iteratively selecting features from the ranking. Applying our feature selection method together with Naive Bayes or k-Nearest Neighbour as a classifier, results in compelling accuracy improvements, compared to prior work.
引用
收藏
页码:510 / 517
页数:8
相关论文
共 50 条
  • [41] Cascading GA & CFS for Feature Subset selection in Medical Data Mining
    Karegowda, Asha Gowda
    Jayaram, M. A.
    2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 1428 - 1431
  • [42] A Hybridization Approach for Optimal Feature Subset Selection in High Dimensional Data
    Sharmili, K. C.
    Chilambuchelvan, A.
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2018, 26 (06) : 949 - 970
  • [43] Aggregating Data Sampling with Feature Subset Selection to Address Skewed Software Defect Data
    Gao, Kehan
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2015, 25 (9-10) : 1531 - 1550
  • [44] Learning from Concept Drifting Data Streams with Unlabeled Data
    Li, Peipei
    Wu, Xindong
    Hu, Xuegang
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1945 - 1946
  • [45] Immune-inspired incremental feature selection technology to data streams
    Yue, Xun
    Mo, Hongwei
    Chi, Zhong-Xian
    APPLIED SOFT COMPUTING, 2008, 8 (02) : 1041 - 1049
  • [46] Merit-guided dynamic feature selection filter for data streams
    Barddal, Jean Paul
    Enembreck, Fabricio
    Gomes, Heitor Murilo
    Bifet, Albert
    Pfahringer, Bernhard
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 116 : 227 - 242
  • [47] Fizzy: feature subset selection for metagenomics
    Ditzler, Gregory
    Morrison, J. Calvin
    Lan, Yemin
    Rosen, Gail L.
    BMC BIOINFORMATICS, 2015, 16
  • [48] The minimum feature subset selection problem
    Bin Chen
    Jiarong Hong
    Yadong Wang
    Journal of Computer Science and Technology, 1997, 12 (2) : 145 - 153
  • [49] Feature Subset Selection by SVM Ensemble
    Ban, Tao
    Inoue, Daisuke
    PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
  • [50] The Minimum Feature Subset Selection Problem
    陈彬
    洪家荣
    王亚东
    JournalofComputerScienceandTechnology, 1997, (02) : 145 - 153