Classification of mislabelled microarrays using robust sparse logistic regression

被引:31
|
作者
Bootkrajang, Jakramate [1 ]
Kaban, Ata [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
关键词
DISCRIMINANT-ANALYSIS; INITIAL SAMPLES; GENE SELECTION; CANCER;
D O I
10.1093/bioinformatics/btt078
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. Results: In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy.
引用
收藏
页码:870 / 877
页数:8
相关论文
共 50 条
  • [41] Robust Logistic Principal Component Regression for Classification of Data in presence of Outliers
    Wu, H. C.
    Chan, S. C.
    Tsui, K. M.
    2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012), 2012,
  • [42] Image Annotation by Sparse Logistic Regression
    He, Siqiong
    Jia, Jinzhu
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING-PCM 2010, PT II, 2010, 6298 : 22 - +
  • [43] Variable selection for sparse logistic regression
    Zanhua Yin
    Metrika, 2020, 83 : 821 - 836
  • [44] Sparse data and use of logistic regression
    Siddarth, Prabha
    EPILEPSIA, 2018, 59 (05) : 1085 - 1086
  • [45] Sparse Logistic Regression with Logical Features
    Zou, Yuan
    Roos, Teemu
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT I, 2016, 9651 : 316 - 327
  • [46] Sparse logistic regression on functional data
    Xu, Yunnan
    Du, Pang
    Robertson, John
    Senger, Ryan
    STATISTICS AND ITS INTERFACE, 2022, 15 (02) : 171 - 179
  • [47] Variable selection for sparse logistic regression
    Yin, Zanhua
    METRIKA, 2020, 83 (07) : 821 - 836
  • [48] An aggregation method for sparse logistic regression
    Liu, Zhe
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 17 (01) : 85 - 96
  • [49] Logistic Regression with Robust Bootstrapping
    Li, Yawei
    Fauss, Michael
    Zoubir, Abdelhak M.
    2019 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2019), 2019, : 346 - 350
  • [50] Robust functional logistic regression
    Akturk, Berkay
    Beyaztas, Ufuk
    Shang, Han Lin
    Mandal, Abhijit
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024,