Classification of mislabelled microarrays using robust sparse logistic regression

被引:31
|
作者
Bootkrajang, Jakramate [1 ]
Kaban, Ata [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
关键词
DISCRIMINANT-ANALYSIS; INITIAL SAMPLES; GENE SELECTION; CANCER;
D O I
10.1093/bioinformatics/btt078
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. Results: In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy.
引用
收藏
页码:870 / 877
页数:8
相关论文
共 50 条
  • [21] SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPARSE LOGISTIC REGRESSION AND SPATIAL-TV REGULARIZATION
    Sun, Le
    Wu, Zenbin
    Liu, Jianjun
    Wei, Zhihui
    2013 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2013, : 1019 - 1022
  • [22] Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data
    Kim, Yongdai
    Kwon, Sunghoon
    Song, Seuck Heun
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 51 (03) : 1643 - 1655
  • [23] Visual Tracking Using Logistic Regression and Sparse Representation
    Wang, Heya
    Wang, Fuxiang
    2014 7TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP 2014), 2014, : 66 - 72
  • [24] Robust Logistic Regression using Shift Parameters
    Tibshirani, Julie
    Manning, Christoper D.
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 124 - 129
  • [25] Prediction of siRNA Potency Using Sparse Logistic Regression
    Hu, Wei
    Hu, John
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (06) : 420 - 427
  • [26] High dimensional classification with combined adaptive sparse PLS and logistic regression
    Durif, Ghislain
    Modolo, Laurent
    Michaelsson, Jakob
    Mold, Jeff E.
    Lambert-Lacroix, Sophie
    Picard, Franck
    BIOINFORMATICS, 2018, 34 (03) : 485 - 493
  • [27] Sparse logistic regression for whole-brain classification of fMRI data
    Ryali, Srikanth
    Supekar, Kaustubh
    Abrams, Daniel A.
    Menon, Vinod
    NEUROIMAGE, 2010, 51 (02) : 752 - 764
  • [28] On Regularized Sparse Logistic Regression
    Zhang, Mengyuan
    Liu, Kai
    23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 1535 - 1540
  • [29] Texture classification using kernel logistic regression
    Tambo, Asongu L.
    Mistry, Rajan B.
    Campbell, Jonathan M.
    Chan, Sherwin R.
    Hang, Xiyi
    INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 259 - 262
  • [30] Multiple Classification Using Logistic Regression Model
    Zou, Baoping
    INTERNET OF VEHICLES - TECHNOLOGIES AND SERVICES, 2016, 10036 : 238 - 243