An Efficient Outlier Detection Approach Over Uncertain Data Stream Based on Frequent Itemset Mining

被引:7
|
作者
Hao, Shangbo [1 ]
Cai, Saihua [1 ]
Sun, Ruizhi [1 ]
Li, Sicong [1 ]
机构
[1] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China
来源
INFORMATION TECHNOLOGY AND CONTROL | 2019年 / 48卷 / 01期
关键词
outlier detection; frequent itemset mining; uncertain data stream; outlier factors; WINDOW;
D O I
10.5755/j01.itc.48.1.21162
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Outlier detection is essential in data-based science. It aims to detect those itemsets that have a significant difference from the other data. With the limitations of equipment precision and network transmission, uncertain data are becoming more common in daily life. However, the traditional outlier detection methods are not applicable for uncertain data stream, and the large volume of data makes outlier detection costly in terms of memory usage and time. Moreover, the multiple scanning of the data stream required for Apriori-like methods is unrealistic. In this paper, a matrix structure is constructed to store the information of an uncertain data stream, and the subsequent mining process is conducted on the matrix structure; therefore, the whole data stream needs to be scanned only once. Then, the "upper cap" concept is used in the FIM-UDS method to mine the frequent itemsets more effectively to support outlier detection. Moreover, two outlier factors and an outlier detection method called FIM-UDSOD are designed to detect potential outliers. Finally, two public datasets are used to verify the efficiency of the FIM-UDS method, and one synthetic dataset is used to evaluate the FIM-UDSOD method. The experimental results show that our proposed FIM-UDSOD method is more effective than other methods in detecting outliers.
引用
收藏
页码:34 / 46
页数:13
相关论文
共 50 条
  • [21] Constrained Frequent Itemset Mining from Uncertain Data Streams
    Leung, Carson Kai-Sang
    Hao, Boyu
    Jiang, Fan
    2010 IEEE 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDE 2010), 2010, : 120 - 127
  • [22] Fast Algorithms for Frequent Itemset Mining from Uncertain Data
    Leung, Carson Kai-Sang
    MacKinnon, Richard Kyle
    Tanbeer, Syed K.
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 893 - 898
  • [23] An uncertainty-based approach: Frequent itemset mining from uncertain data with different item importance
    Lee, Gangin
    Yun, Unil
    Ryang, Heungmo
    KNOWLEDGE-BASED SYSTEMS, 2015, 90 : 239 - 256
  • [24] An algorithm for mining constrained maximal frequent itemset in uncertain data
    Du, Haizhou
    Journal of Information and Computational Science, 2012, 9 (15): : 4509 - 4515
  • [25] Frequent positive and negative (FPN) itemset approach for outlier detection
    Suhailis, Anis
    Kadir, Abdul
    Abu Bakar, Azuraliza
    Hamdan, Abdul Razak
    INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1049 - 1065
  • [26] An efficient algorithm for frequent itemset mining on data streams
    Xie Zhi-Jun
    Chen Hong
    Li, Cuiping
    ADVANCES IN DATA MINING: APPLICATIONS IN MEDICINE, WEB MINING, MARKETING, IMAGE AND SIGNAL MINING, 2006, 4065 : 474 - 491
  • [27] An Efficient Closed Frequent Itemset Miner for the MOA Stream Mining System
    Quadrana, Massimo
    Bifet, Albert
    Gavalda, Ricard
    ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE OF THE CATALAN ASSOCIATION FOR ARTIFICIAL INTELLIGENCE, 2013, 256 : 203 - 212
  • [28] Probabilistic Frequent Itemset Mining Algorithm over Uncertain Databases with Sampling
    Li, Hai-Feng
    Zhang, Ning
    Zhang, Yue-Jin
    Wang, Yue
    FUZZY SYSTEMS AND DATA MINING II, 2016, 293 : 159 - 166
  • [29] AT-Mine: An Efficient Algorithm of Frequent Itemset Mining on Uncertain Dataset
    Wang, Le
    Feng, Lin
    Wu, Mingfei
    JOURNAL OF COMPUTERS, 2013, 8 (06) : 1417 - 1426
  • [30] Probabilistic maximal frequent itemset mining methods over uncertain databases
    Li, Haifeng
    Hai, Mo
    Zhang, Ning
    Zhu, Jianming
    Wang, Yue
    Cao, Huaihu
    INTELLIGENT DATA ANALYSIS, 2019, 23 (06) : 1219 - 1241