An outlier detection approach in large-scale data stream using rough set

被引:8
|
作者
Singh, Manmohan [1 ]
Pamula, Rajendra [1 ]
机构
[1] Indian Sch Mines, Indian Inst Technol, Dept Comp Sci & Engn, Dhanbad 826004, Jharkhand, India
来源
NEURAL COMPUTING & APPLICATIONS | 2020年 / 32卷 / 13期
关键词
Relative information entropy; Outlier detection; Rough sets; Data mining; Indiscernible sets; INFORMATION-ENTROPY; UNCERTAINTY; REDUCTION;
D O I
10.1007/s00521-019-04421-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection has become an important research area in the field of stream data mining due to its vast applications. In the literature, many methods have been proposed, but they work well for simple and positive regions of outliers, where boundary regions are not given much importance. Moreover, an algorithm which processes stream data must be effective and able to compute infinite data in one pass or limited number of passes. These problems have motivated us to propose an outlier detection approach for large-scale data stream. The proposed algorithm employs the concept of relative cardinality, entropy outlier factor theory of information-based system, and size-variant sliding window in stream data. In addition, we propose a new methodology for concept drift adaptation on evolving data streams. The proposed method is executed on nine benchmark datasets and compared with six existing methods that are EXPoSE, iForest, OC-SVM, LOF, KDE, and FastAbod. Experimental results show that the proposed method outperforms six existing methods in terms of receiver operating characteristic curve, precision recall, and computational time for positive regions as well as for boundary regions.
引用
收藏
页码:9113 / 9127
页数:15
相关论文
共 50 条
  • [21] On Set: A Visualization Technique for Large-scale Binary Set Data
    Sadana, Ramik
    Major, Timothy
    Dove, Alistair
    Stasko, John
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2014, 20 (12) : 1993 - 2002
  • [22] Rough Sets, Kernel Set, and Spatiotemporal Outlier Detection
    Albanese, Alessia
    Pal, Sankar K.
    Petrosino, Alfredo
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (01) : 194 - 207
  • [23] Fast outlier detection method based on Rough set
    El Meziati, Marouane
    Ziyati, Houssaine
    9TH INTERNATIONAL SYMPOSIUM ON SIGNAL, IMAGE, VIDEO AND COMMUNICATIONS (ISIVC 2018), 2018, : 60 - 66
  • [24] Optimizing data stream processing for large-scale applications
    Cappellari, Paolo
    Roantree, Mark
    Chun, Soon Ae
    SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (09): : 1607 - 1641
  • [25] Mining pinyin-to-character conversion rules from large-scale corpus: A rough set approach
    Wang, XL
    Chen, QC
    Yeung, DS
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (02): : 834 - 844
  • [26] Evaluation efficiency of large-scale data set with negative data: an artificial neural network approach
    Mehdi Toloo
    Ameneh Zandi
    Ali Emrouznejad
    The Journal of Supercomputing, 2015, 71 : 2397 - 2411
  • [27] Evaluation efficiency of large-scale data set with negative data: an artificial neural network approach
    Toloo, Mehdi
    Zandi, Ameneh
    Emrouznejad, Ali
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (07): : 2397 - 2411
  • [28] An rough entropy based approach to outlier detection
    Li, Xiangjun
    Rao, Fen
    Journal of Computational Information Systems, 2012, 8 (24): : 10501 - 10508
  • [29] Large-scale test data set for location problems
    Cebecauer, Matej
    Buzna, Lubos
    DATA IN BRIEF, 2018, 17 : 267 - 274
  • [30] A large-scale crop protection bioassay data set
    Gaulton, Anna
    Kale, Namrata
    van Westen, Gerard J. P.
    Bellis, Louisa J.
    Bento, A. Patricia
    Davies, Mark
    Hersey, Anne
    Papadatos, George
    Forster, Mark
    Wege, Philip
    Overington, John P.
    SCIENTIFIC DATA, 2015, 2