Outlyingness: Which variables contribute most?

被引:8
|
作者
Debruyne, Michiel [1 ]
Hoeppner, Sebastiaan [2 ]
Serneels, Sven [3 ]
Verdonck, Tim [2 ]
机构
[1] Dexia, Credit Risk Modelling, Marsveldpl 5, B-1050 Brussels, Belgium
[2] Katholieke Univ Leuven, Dept Math, Celestijnenlaan 200B, B-3001 Leuven, Belgium
[3] BASF Corp, 540 White Plains Rd, Tarrytown, NY 10591 USA
关键词
Partial least squares; Robust statistics; Sparsity; Variable selection; MULTIVARIATE OUTLIERS; ROBUST; SPARSE;
D O I
10.1007/s11222-018-9831-5
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier's outlyingness. Thereby, it helps the analyst understand in which way an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal outlyingness. It is shown that the problem of estimating that direction can be rewritten as the normed solution of a classical least squares regression problem. Identifying the subset of variables contributing most to outlyingness, can thus be achieved by estimating the associated least squares problem in a sparse manner. From a practical perspective, sparse partial least squares (SPLS) regression, preferably by the fast sparse NIPALS (SNIPLS) algorithm, is suggested to tackle that problem. The performed method is demonstrated to perform well both on simulated data and real life examples.
引用
收藏
页码:707 / 723
页数:17
相关论文
共 50 条
  • [31] Urban cause-specific socioeconomic mortality differences. Which causes of death contribute most?
    Middelkoop, BJC
    Struben, HWA
    Burger, I
    Vroom-Jongerden, JM
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2001, 30 (02) : 240 - 247
  • [32] Study of Poaceae phenology in a Mediterranean climate. Which species contribute most to airborne pollen counts?
    Leon-Ruiz, Eduardo
    Alcazar, Purificacion
    Dominguez-Vilches, Eugenio
    Galan, Carmen
    AEROBIOLOGIA, 2011, 27 (01) : 37 - 50
  • [33] Which method is most precise; which is most accurate? An undergraduate experiment
    Jordan, A. D.
    JOURNAL OF CHEMICAL EDUCATION, 2007, 84 (09) : 1459 - 1460
  • [34] Study of Poaceae phenology in a Mediterranean climate. Which species contribute most to airborne pollen counts?
    Eduardo León-Ruiz
    Purificación Alcázar
    Eugenio Domínguez-Vilches
    Carmen Galán
    Aerobiologia, 2011, 27 : 37 - 50
  • [35] Which psychosocial variables affect drive the most? Analysis of sexual desire in a group of Italian men
    Nimbi, Filippo Maria
    Tripodi, Francesca
    Rossi, Roberta
    Michetti, Paolo Maria
    Simonelli, Chiara
    INTERNATIONAL JOURNAL OF IMPOTENCE RESEARCH, 2019, 31 (06) : 410 - 423
  • [36] Which psychosocial variables affect drive the most? Analysis of sexual desire in a group of Italian men
    Filippo Maria Nimbi
    Francesca Tripodi
    Roberta Rossi
    Paolo Maria Michetti
    Chiara Simonelli
    International Journal of Impotence Research, 2019, 31 : 410 - 423
  • [37] Which is the most economic?
    Wolgast, Rudolf
    NACHRICHTEN AUS DER CHEMIE, 2007, 55 (01) : 65 - 65
  • [38] WHICH IS THE MOST BEAUTIFUL
    WELLS, D
    MATHEMATICAL INTELLIGENCER, 1988, 10 (04): : 30 - 30
  • [39] WHICH CLINICAL-VARIABLES CONTRIBUTE TO THE PHYSICIANS ASSESSMENT OF MEDIUM TERM OUTCOME IN RHEUMATOID-ARTHRITIS
    VANZEBEN, D
    HAZES, JMW
    BREEDVELD, FC
    ZWINDERMAN, AH
    VANDENBROUCKE, JP
    JOURNAL OF RHEUMATOLOGY, 1993, 20 (01) : 33 - 39
  • [40] Which interventions contribute most to the net effect of England's agri-environment schemes on pollination services?
    Image, Mike
    Gardner, Emma
    Clough, Yann
    Kunin, William E.
    Potts, Simon G.
    Smith, Henrik G.
    Stone, Graham N.
    Westbury, Duncan B.
    Breeze, Tom D.
    LANDSCAPE ECOLOGY, 2023, 38 (01) : 271 - 291