Joint outlier detection and variable selection using discrete optimization

被引:1
|
作者
Jammal, Mahdi [1 ,2 ]
Canu, Stephane [1 ]
Abdallah, Maher [3 ]
机构
[1] Inst Natl Sci Appl INSA Rouen, 685 Ave Univ, F-76800 St Etienne Du Rouvray, France
[2] Lebanese Univ, Beirut, Lebanon
[3] Lebanese Univ, Fac Publ Hlth, Beirut, Lebanon
关键词
Robust optimization; statistical learning; linear regression; variable selection; outlier detection; mixed integer programming; TRIMMED SQUARES REGRESSION; SHRINKAGE; NONCONVEX;
D O I
10.2436/20.8080.02.109
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
In regression, the quality of estimators is known to be very sensitive to the presence of spurious variables and outliers. Unfortunately, this is a frequent situation when dealing with real data. To handle outlier proneness and achieve variable selection, we propose a robust method performing the outright rejection of discordant observations together with the selection of relevant variables. A natural way to define the corresponding optimization problem is to use the l(0) norm and recast it as a mixed integer optimization problem. To retrieve this global solution more efficiently, we suggest the use of additional constraints as well as a clever initialization. To this end, an efficient and scalable non-convex proximal alternate algorithm is introduced. An empirical comparison between the l(0) norm approach and its l(1) relaxation is presented as well. Results on both synthetic and real data sets provided that the mixed integer programming approach and its discrete first order warm start provide high quality solutions.
引用
收藏
页码:47 / 66
页数:20
相关论文
共 50 条
  • [41] Interactive algorithm for the selection of dimensions in Outlier detection
    Boudjeloud-Assala, Lydia
    Poulet, François
    Revue d'Intelligence Artificielle, 2008, 22 (3-4) : 401 - 420
  • [42] Unsupervised Feature Selection for Outlier Detection in Categorical Data using Mutual Information
    Suri, N. N. R. Ranga
    Murty, M. Narasimha
    Athithan, G.
    2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 253 - 258
  • [43] Covariance Based Outlier Detection with Feature Selection
    Zwilling, Chris E.
    Wang, Michelle Y.
    2016 38TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2016, : 2606 - 2609
  • [44] On normalization and algorithm selection for unsupervised outlier detection
    Sevvandi Kandanaarachchi
    Mario A. Muñoz
    Rob J. Hyndman
    Kate Smith-Miles
    Data Mining and Knowledge Discovery, 2020, 34 : 309 - 354
  • [45] IFODHD: Improved Feature Selection Based Outlier Detection using Hyperdimensional Computing
    Xu, Wenrui
    Krainess, Evan
    Payani, Ali
    Latapie, Hugo
    Parhi, Keshab K.
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2025, : 791 - 807
  • [46] Discrete Nonparametric Algorithms for Outlier Detection with Genomic Data
    Ghosh, Debashis
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2010, 20 (02) : 193 - 208
  • [47] A method for simultaneous variable selection and outlier identification in linear regression
    Hoeting, J
    Raftery, AE
    Madigan, D
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1996, 22 (03) : 251 - 270
  • [48] STEPWISE DISCRETE VARIABLE SELECTION PROCEDURE
    GOLDSTEIN, M
    DILLON, WR
    COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1977, 6 (14): : 1423 - 1436
  • [49] Joint likelihood estimation and model order selection for outlier censoring
    Karbasi, Syed M.
    IET RADAR SONAR AND NAVIGATION, 2021, 15 (06): : 561 - 573
  • [50] Regression-Based Outlier Detection of Sensor Measurements Using Independent Variable Synthesis
    Park, Chang Mok
    Jeon, Jesung
    DATA SCIENCE, 2015, 9208 : 78 - 86