Outliers Detection for Pareto Distributed Data

被引:2
|
作者
Safari, M. A. Mohd [1 ]
Masseran, N. [1 ]
Ibrahim, K. [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Sci & Technol, Sch Math Sci, Bangi 43600, Selangor, Malaysia
来源
关键词
BOXPLOT; TAIL;
D O I
10.1063/1.5028034
中图分类号
O59 [应用物理学];
学科分类号
摘要
This study aims to examine the presence of outliers in the upper tail of Malaysian income distribution under the assumption that the data follow Pareto model. For this purpose, three types of boxplot: standard boxplot, adjusted boxplot and generalized boxplot are considered. The performance of these boxplots is determined by a simulation study. In this study, the data were simulated from Pareto distribution, P(1, alpha = 2, 3, 4), then the simulated data were contaminated by replacing a proportion epsilon (3%, 5%, 10%) of randomly selected data. It is found that the generalized boxplot gives higher power value compared to the standard and adjusted boxplots. Therefore, the generalized boxplot was used for determining the presence of outliers in the upper tail of income distribution, while the threshold for Pareto tail modelling was determined by using Van Kerm's formula. The results showed that 0.4%, 0.4%, 0.9% and 1.2% outliers were detected by the generalized boxplot in the household income data that exceeded the threshold for the years of 2007, 2009, 2012 and 2014.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] On determining the number of outliers in exponential and Pareto samples
    Jeevanand, ES
    Nair, NU
    STATISTICAL PAPERS, 1998, 39 (03) : 277 - 290
  • [22] On determining the number of outliers in exponential and Pareto samples
    E. S. Jeevanand
    N. Unnikrishnan Nair
    Statistical Papers, 1998, 39 : 277 - 290
  • [23] Uniformly Most Powerful CFAR Test for Pareto-Target Detection in Pareto Distributed Clutter
    Gali, John Bob
    Ray, Priyadip
    Das, Goutam
    2020 TWENTY SIXTH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC 2020), 2020,
  • [24] Biogeography Based Optimization for Distributed CFAR Detection in Pareto Clutter
    Zebiri, Khaled
    Mezache, Amar
    Soltani, Faouzi
    Mezache, Amar
    Bentoumi, Ahmed
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON ELECTRICAL AND INFORMATION TECHNOLOGIES (ICEIT 2017), 2017,
  • [25] Detection of outliers in data streams using grouping methods
    Duraj, Agnieszka
    Chomatek, Lukasz
    PRZEGLAD ELEKTROTECHNICZNY, 2019, 95 (02): : 85 - 87
  • [26] Modeling of activation data in the BrainMap™ database:: Detection of outliers
    Nielsen, FÅ
    Hansen, LK
    HUMAN BRAIN MAPPING, 2002, 15 (03) : 146 - 156
  • [27] Distributed Fault Detection using Sensor Networks and Pareto Estimation
    Boem, Francesca
    Xu, Yuzhe
    Fischione, Carlo
    Parisini, Thomas
    2013 EUROPEAN CONTROL CONFERENCE (ECC), 2013, : 932 - 937
  • [28] Outliers Detection Method Using Clustering in Buildings Data
    Habib, Usman
    Zucker, Gerhard
    Bloechle, Max
    Judex, Florian
    Haase, Jan
    IECON 2015 - 41ST ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2015, : 694 - 700
  • [29] Online detection of outliers in clusters of continuous data streaming
    Pereira, Mariana A.
    Faria, Elaine R.
    Naldi, Murilo C.
    2017 6TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2017, : 324 - 329
  • [30] ARMAsel for detection and correction of outliers in univariate stochastic data
    Broersen, Piet A. T.
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2008, 57 (03) : 446 - 453