A predictive DEA model for outlier detection

被引：8

作者：

Yang, Mingwen ^{[1
,2
]}

Wan, Guohua ^{[1
]}

Zheng, Eric ^{[2
]}

机构：

[1] Shanghai Jiao Tong Univ, Antai Coll Econ & Management, Shanghai 200030, Peoples R China

[2] Univ Texas Dallas, Naveen Jindal Sch Management, Richardson, TX 75080 USA

来源：

JOURNAL OF MANAGEMENT ANALYTICS | 2014年 / 1卷 / 01期

关键词：

predictive DEA; Bi-super DEA; outlier detection; simulation;

D O I：

10.1080/23270012.2014.889911

中图分类号：

F [经济];

学科分类号：

02 ;

摘要：

Outlier detection is one of the key issues in any data-driven analytics. In this paper, we propose Bi-super DEA, a super DEA-based method that constructs both efficient and inefficient frontiers for outlier detection. In evaluating its predictive performance, we develop a novel predictive DEA procedure, PDEA, which extends the conventional DEA approaches that have been primarily used for in-sample efficiency estimation, to predict outputs for the out-of-sample. This enables us to compare the predictive performance of our approach against several popular outlier detection methods including the parametric robust regression in statistics and non-parametric k-means in data mining. We conduct comprehensive simulation experiments to examine the relative performance of these outlier detection methods under the influence of five factors: sample size, linearity of production function, normality of noise distribution, homogeneity of data, and levels of random noise contaminating the data generating process (DGP). We find that, somewhat surprisingly, Bi-super CCR consistently outperforms Bi-super BCC in detecting outliers. Under the linearity, normality and homogeneity conditions, the parametric robust regression method works best. However, when the DGP violates these conditions, Bi-super DEA emerges as the better choice due to its distribution-free property. Our results shed light on the conditions that each method excels or fails and provide users with practical guidelines on how to choose appropriate methods to detect outliers.

引用

页码：20 / 41

页数：22

共 50 条

[31] A Model-based Approach for Text Clustering with Outlier Detection
Yin, Jianhua
Wang, Jianyong
2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 625 - 636
[32] A modified hidden Markov model for outlier detection in multivariate datasets
Manoharan, G.
Sivakumar, K.
INTERNATIONAL JOURNAL OF ENGINEERING SYSTEMS MODELLING AND SIMULATION, 2024, 15 (03) : 121 - 128
[33] GMDH-Based Outlier Detection Model in Classification Problems
Xie, Ling
Jia, Yanlin
Xiao, Jin
Gu, Xin
Huang, Jing
JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2020, 33 (05) : 1516 - 1532
[34] A New Outlier Detection Method Considering Outliers As Model Errors
Hekimoglu, S.
Erdogan, B.
Erenoglu, R. C.
EXPERIMENTAL TECHNIQUES, 2015, 39 (01) : 57 - 68
[35] Model-based clustering and outlier detection with missing data
Tong, Hung
Tortora, Cristina
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2022, 16 (01) : 5 - 30
[36] VOS: A new outlier detection model using virtual graph
Wang, Chao
Liu, Zhen
Gao, Hui
Fu, Yan
KNOWLEDGE-BASED SYSTEMS, 2019, 185
[37] Outlier Detection in Balanced Replicated Linear Functional Relationship Model
Arif, Azuraini Mohd
Zubairi, Yong Zulina
Hussin, Abdul Ghapor
SAINS MALAYSIANA, 2022, 51 (02): : 599 - 607
[38] A Mixture Model-Based Combination Approach for Outlier Detection
Bouguessa, Mohamed
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2014, 23 (04)
[39] Robust Response Transformation Using Outlier Detection in Regression Model
Seo, Han Son
Lee, Ga Yoen
Yoon, Min
KOREAN JOURNAL OF APPLIED STATISTICS, 2012, 25 (01) : 205 - 213
[40] Outlier Detection in a Circular Regression Model Using COVRATIO Statistic
Ibrahim, S.
Rambli, A.
Hussin, A. G.
Mohamed, I.
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2013, 42 (10) : 2272 - 2280

← 1 2 3 4 5 →