Classification of mislabelled microarrays using robust sparse logistic regression

被引:31
|
作者
Bootkrajang, Jakramate [1 ]
Kaban, Ata [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
关键词
DISCRIMINANT-ANALYSIS; INITIAL SAMPLES; GENE SELECTION; CANCER;
D O I
10.1093/bioinformatics/btt078
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. Results: In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy.
引用
收藏
页码:870 / 877
页数:8
相关论文
共 50 条
  • [1] Robust and sparse logistic regression
    Cornilly, Dries
    Tubex, Lise
    Van Aelst, Stefan
    Verdonck, Tim
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024, 18 (03) : 663 - 679
  • [2] Gene expression data classification with robust sparse logistic regression using fused regularisation
    Lavanya, Kampa
    Rambabu, Pemula
    Suresh, G. Vijay
    Bhandari, Rahul
    INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING, 2023, 42 (04) : 281 - 291
  • [3] Classification of gene microarrays by penalized logistic regression
    Zhu, J
    Hastie, T
    BIOSTATISTICS, 2004, 5 (03) : 427 - 443
  • [4] Robust Logistic Regression and Classification
    Feng, Jiashi
    Xu, Huan
    Mannor, Shie
    Yan, Shuicheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [5] Penalized robust estimators in sparse logistic regression
    Bianco, Ana M.
    Boente, Graciela
    Chebi, Gonzalo
    TEST, 2022, 31 (03) : 563 - 594
  • [6] Penalized robust estimators in sparse logistic regression
    Ana M. Bianco
    Graciela Boente
    Gonzalo Chebi
    TEST, 2022, 31 : 563 - 594
  • [7] Approximate Sparse Multinomial Logistic Regression for Classification
    Kayabol, Koray
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 490 - 493
  • [8] Multiclass Classification by Sparse Multinomial Logistic Regression
    Abramovich, Felix
    Grinshtein, Vadim
    Levy, Tomer
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (07) : 4637 - 4646
  • [9] Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification
    Cao, Faxian
    Yang, Zhijing
    Ren, Jinchang
    Ling, Wing-Kuen
    Zhao, Huimin
    Marshall, Stephen
    REMOTE SENSING, 2017, 9 (12)
  • [10] Doubly robust logistic regression for image classification
    Song, Zihao
    Wang, Lei
    Xu, Xiangjian
    Zhao, Weihua
    APPLIED MATHEMATICAL MODELLING, 2023, 123 : 430 - 446