Outlier detection by example

被引:11
|
作者
Zhu, Cui [1 ]
Kitagawa, Hiroyuki [2 ]
Papadimitriou, Spiros [3 ]
Faloutsos, Christos [4 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
[2] Univ Tsukuba, Grad Sch Syst & Informat Engn, Ctr Computat Sci, Tsukuba, Ibaraki 3058577, Japan
[3] IBM T J Watson, Hawthorne, NY USA
[4] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Outlier detection; Outlier example; Data mining; Machine learning;
D O I
10.1007/s10844-010-0128-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection is a useful technique in such areas as fraud detection, financial analysis and health monitoring. Many recent approaches detect outliers according to reasonable, pre-defined concepts of an outlier (e.g., distance-based, density-based, etc.). However, the definition of an outlier differs between users or even datasets. This paper presents a solution to this problem by including input from the users. Our OBE (Outlier By Example) system is the first that allows users to provide examples of outliers in low-dimensional datasets. By incorporating a small number of such examples, OBE can successfully develop an algorithm by which to identify further outliers based on their outlierness. Several algorithmic challenges and engineering decisions must be addressed in building such a system. We describe the key design decisions and algorithms in this paper. In order to interact with users having different degrees of domain knowledge, we develop two detection schemes: OBE-Fraction and OBE-RF. Our experiments on both real and synthetic datasets demonstrate that OBE can discover values that a user would consider outliers.
引用
收藏
页码:217 / 247
页数:31
相关论文
共 50 条
  • [1] Outlier detection by example
    Cui Zhu
    Hiroyuki Kitagawa
    Spiros Papadimitriou
    Christos Faloutsos
    Journal of Intelligent Information Systems, 2011, 36 : 217 - 247
  • [2] DB-Outlier detection by example in high dimensional datasets
    Li, Yuan
    Kitagawa, Hiroyuki
    2007 IEEE INTERNATIONAL WORKSHOP ON DATABASES FOR NEXT GENERATION RESEARCHERS, 2007, : 73 - +
  • [3] OBE: Outlier by example
    Zhu, C
    Kitagawa, H
    Papadimitriou, S
    Faloutsos, C
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2004, 3056 : 222 - 234
  • [4] Toward Scalable and Unified Example-Based Explanation and Outlier Detection
    Chong, Penny
    Cheung, Ngai-Man
    Elovici, Yuval
    Binder, Alexander
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 525 - 540
  • [5] Example-based robust outlier detection in high dimensional datasets
    Zhu, C
    Kitagawa, H
    Faloutsos, C
    FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 829 - 832
  • [6] Outlier detection
    Su, Xiaogang
    Tsai, Chih-Ling
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (03) : 261 - 268
  • [7] Example-based robust DB-Outlier detection for high dimensional data
    Li, Yuan
    Kitagawa, Hiroyuki
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2008, 4947 : 330 - +
  • [8] A Comparative Study of Cluster Based Outlier Detection, Distance Based Outlier Detection and Density Based Outlier Detection Techniques
    Mandhare, Harshada C.
    Idate, S. R.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 931 - 935
  • [9] Functional Outlier Detection
    Oguamalam, Jeremy
    Radojicic, Una
    Filzmoser, Peter
    COMBINING, MODELLING AND ANALYZING IMPRECISION, RANDOMNESS AND DEPENDENCE, SMPS 2024, 2024, 1458 : 325 - 333
  • [10] Neighborhood outlier detection
    Chen, Yumin
    Miao, Duoqian
    Zhang, Hongyun
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (12) : 8745 - 8749