WMFP-Outlier: An Efficient Maximal Frequent-Pattern-Based Outlier Detection Approach for Weighted Data Streams

被引:10
|
作者
Cai, Saihua [1 ]
Li, Qian [1 ]
Li, Sicong [1 ]
Yuan, Gang [1 ]
Sun, Ruizhi [1 ,2 ]
机构
[1] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China
[2] Minist Agr, Sci Res Base Integrated Technol Precis Agr Anim H, Beijing 100083, Peoples R China
来源
INFORMATION TECHNOLOGY AND CONTROL | 2019年 / 48卷 / 04期
关键词
outlier detection; weighted maximal frequent-pattern mining; weighted data stream; deviation indices; data mining;
D O I
10.5755/j01.itc.48.4.22176
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Since outliers are the major factors that affect accuracy in data science, many outlier detection approaches have been proposed for effectively identifying the implicit outliers from static datasets, thereby improving the reliability of the data. In recent years, data streams have been the main form of data, and the data elements in a data stream are not always of equal importance. However, the existing outlier detection approaches do not consider the weight conditions; hence, these methods are not suitable for processing weighted data streams. In addition, the traditional pattern-based outlier detection approaches incur a high time cost in the outlier detection phase. Aiming at overcoming these problems, this paper proposes a two-phase pattern-based outlier detection approach, namely, WMFP-Outlier, for effectively detecting the implicit outliers from a weighted data stream, in which the maximal frequent patterns are used instead of the frequent patterns to accelerate the process of outlier detection. In the process of maximal frequent-pattern mining, the anti-monotonicity property and MFP-array structure are used to accelerate the mining operation. In the process of outlier detection, three deviation indices are designed for measuring the degree of abnormality of each transaction, and the transactions with the highest degrees of abnormality are judged as outliers. Last, several experimental studies are conducted on a synthetic dataset to evaluate the performance of the proposed WMFP-Outlier approach. The results demonstrate that the accuracy of the WMFP-Outlier approach is higher compared to the existing pattern-based outlier detection approaches, and the time cost of the outlier detection phase of WMFP-Outlier is lower than those of the other four compared pattern-based outlier detection approaches.
引用
收藏
页码:505 / 521
页数:17
相关论文
共 50 条
  • [21] Minimal Rare Pattern-Based Outlier Detection Approach For Uncertain Data Streams Under Monotonic Constraints
    Cai, Saihua
    Chen, Jinfu
    Chen, Haibo
    Zhang, Chi
    Li, Qian
    Shi, Dengzhou
    Lin, Wei
    COMPUTER JOURNAL, 2023, 66 (01): : 16 - 34
  • [22] Sliding window based weighted maximal frequent pattern mining over data streams
    Lee, Gangin
    Yun, Unil
    Ryu, Keun Ho
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (02) : 694 - 708
  • [23] DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams
    Na, Gyoung S.
    Kim, Donghyun
    Yu, Hwanjo
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1993 - 2002
  • [24] An Efficient Approach for Outlier Detection with Imperfect Data Labels
    Liu, Bo
    Xiao, Yanshan
    Yu, Philip S.
    Hao, Zhifeng
    Cao, Longbing
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (07) : 1602 - 1616
  • [25] An Outlier Detection Algorithm for Data Streams Based on Fuzzy Clustering
    Su, Xiaoke
    Qin, Yuming
    Wan, Renxia
    PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, 2008, : 109 - 112
  • [26] Explainable Distance-Based Outlier Detection in Data Streams
    Toliopoulos, Theodoros
    Gounaris, Anastasios
    IEEE ACCESS, 2022, 10 : 47921 - 47936
  • [27] An Efficient Distance and Density Based Outlier Detection Approach
    Zhong, Xunbiao
    Huang, Xiaoxia
    MECHANICAL ENGINEERING AND GREEN MANUFACTURING II, PTS 1 AND 2, 2012, 155-156 : 342 - 347
  • [28] An Efficient Density-Based Local Outlier Detection Approach for Scattered Data
    Su, Shubin
    Xiao, Limin
    Ruan, Li
    Gu, Fei
    Li, Shupan
    Wang, Zhaokai
    Xu, Rongbin
    IEEE ACCESS, 2019, 7 : 1006 - 1020
  • [29] Fast Memory Efficient Local Outlier Detection in Data Streams (Extended Abstract)
    Salehi, Mahsa
    Leckie, Christopher
    Bezdek, James C.
    Vaithianathan, Tharshan
    Zhang, Xuyun
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 51 - 52
  • [30] KDE based outlier detection on distributed data streams in multimedia network
    Zheng, Zhigao
    Jeong, Hwa-Young
    Huang, Tao
    Shu, Jiangbo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (17) : 18027 - 18045