A predictive DEA model for outlier detection

被引:8
|
作者
Yang, Mingwen [1 ,2 ]
Wan, Guohua [1 ]
Zheng, Eric [2 ]
机构
[1] Shanghai Jiao Tong Univ, Antai Coll Econ & Management, Shanghai 200030, Peoples R China
[2] Univ Texas Dallas, Naveen Jindal Sch Management, Richardson, TX 75080 USA
关键词
predictive DEA; Bi-super DEA; outlier detection; simulation;
D O I
10.1080/23270012.2014.889911
中图分类号
F [经济];
学科分类号
02 ;
摘要
Outlier detection is one of the key issues in any data-driven analytics. In this paper, we propose Bi-super DEA, a super DEA-based method that constructs both efficient and inefficient frontiers for outlier detection. In evaluating its predictive performance, we develop a novel predictive DEA procedure, PDEA, which extends the conventional DEA approaches that have been primarily used for in-sample efficiency estimation, to predict outputs for the out-of-sample. This enables us to compare the predictive performance of our approach against several popular outlier detection methods including the parametric robust regression in statistics and non-parametric k-means in data mining. We conduct comprehensive simulation experiments to examine the relative performance of these outlier detection methods under the influence of five factors: sample size, linearity of production function, normality of noise distribution, homogeneity of data, and levels of random noise contaminating the data generating process (DGP). We find that, somewhat surprisingly, Bi-super CCR consistently outperforms Bi-super BCC in detecting outliers. Under the linearity, normality and homogeneity conditions, the parametric robust regression method works best. However, when the DGP violates these conditions, Bi-super DEA emerges as the better choice due to its distribution-free property. Our results shed light on the conditions that each method excels or fails and provide users with practical guidelines on how to choose appropriate methods to detect outliers.
引用
收藏
页码:20 / 41
页数:22
相关论文
共 50 条
  • [31] A Model-based Approach for Text Clustering with Outlier Detection
    Yin, Jianhua
    Wang, Jianyong
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 625 - 636
  • [32] A modified hidden Markov model for outlier detection in multivariate datasets
    Manoharan, G.
    Sivakumar, K.
    INTERNATIONAL JOURNAL OF ENGINEERING SYSTEMS MODELLING AND SIMULATION, 2024, 15 (03) : 121 - 128
  • [33] GMDH-Based Outlier Detection Model in Classification Problems
    Xie, Ling
    Jia, Yanlin
    Xiao, Jin
    Gu, Xin
    Huang, Jing
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2020, 33 (05) : 1516 - 1532
  • [34] A New Outlier Detection Method Considering Outliers As Model Errors
    Hekimoglu, S.
    Erdogan, B.
    Erenoglu, R. C.
    EXPERIMENTAL TECHNIQUES, 2015, 39 (01) : 57 - 68
  • [35] Model-based clustering and outlier detection with missing data
    Tong, Hung
    Tortora, Cristina
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2022, 16 (01) : 5 - 30
  • [36] VOS: A new outlier detection model using virtual graph
    Wang, Chao
    Liu, Zhen
    Gao, Hui
    Fu, Yan
    KNOWLEDGE-BASED SYSTEMS, 2019, 185
  • [37] Outlier Detection in Balanced Replicated Linear Functional Relationship Model
    Arif, Azuraini Mohd
    Zubairi, Yong Zulina
    Hussin, Abdul Ghapor
    SAINS MALAYSIANA, 2022, 51 (02): : 599 - 607
  • [38] A Mixture Model-Based Combination Approach for Outlier Detection
    Bouguessa, Mohamed
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2014, 23 (04)
  • [39] Robust Response Transformation Using Outlier Detection in Regression Model
    Seo, Han Son
    Lee, Ga Yoen
    Yoon, Min
    KOREAN JOURNAL OF APPLIED STATISTICS, 2012, 25 (01) : 205 - 213
  • [40] Outlier Detection in a Circular Regression Model Using COVRATIO Statistic
    Ibrahim, S.
    Rambli, A.
    Hussin, A. G.
    Mohamed, I.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2013, 42 (10) : 2272 - 2280