Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions

被引:221
|
作者
Somorjai, RL [1 ]
Dolenko, B [1 ]
Baumgartner, R [1 ]
机构
[1] Natl Res Council Canada, Inst Biodiagnost, Winnipeg, MB R3B 1Y6, Canada
关键词
D O I
10.1093/bioinformatics/btg182
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Two practical realities constrain the analysis of microarray data, mass spectra from proteomics, and biomedical infrared or magnetic resonance. spectra. One is the 'curse of dimensionality': the number of features characterizing these data is in the thousands or tens of thousands. The other is the 'curse of dataset sparsity': the number of samples is limited. The consequences of these two curses are far-reaching when such data are used to classify the presence or absence of disease. Results: Using very simple classifiers, we show for several publicly available microarray and proteomics datasets how these curses influence classification outcomes. In particular, even if the sample per feature ratio is increased to the recommended 5-10 by feature extraction/reduction methods, dataset sparsity can render any classification result statistically suspect. In addition, several 'optimal' feature sets are typically identifiable for sparse datasets, all producing perfect classification results, both for the training and independent validation sets. This non-uniqueness leads to interpretational difficulties and casts doubt on the biological relevance of any of these 'optimal' feature sets. We suggest an approach to assess the relative quality of apparently equally good classifiers.
引用
收藏
页码:1484 / 1491
页数:8
相关论文
共 50 条
  • [1] Gene selection for multi-class prediction of microarray data
    Chen, DC
    Hua, D
    Reifman, J
    Cheng, XZ
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 492 - 495
  • [2] Class Aware Exemplar Discovery from Microarray Gene Expression Data
    Sharma, Shivani
    Agrawal, Abhinna
    Patel, Dhaval
    BIG DATA ANALYTICS, BDA 2015, 2015, 9498 : 244 - 257
  • [3] Class prediction and pattern discovery in microarray data - artificial intelligence and algebraic methods
    Swierniak, Andrzej
    Fujarewicz, Krzysztof
    Simek, Krzysztof
    Swierniak, Michal
    2009 FIRST ASIAN CONFERENCE ON INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2009, : 57 - +
  • [4] Simultaneous Class Discovery and Classification of Microarray Data Using Spectral Analysis
    Qiu, Peng
    Plevritis, Sylvia K.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (07) : 935 - 944
  • [5] Gene discovery in neuropharmacological and behavioral studies using Affymetrix microarray data
    Reimers, M
    Heilig, M
    Sommer, WH
    METHODS, 2005, 37 (03) : 219 - 228
  • [6] Triclustering Discovery Using the δ-Trimax Method on Microarray Gene Expression Data
    Siswantining, Titin
    Saputra, Noval
    Sarwinda, Devvi
    Al-Ash, Herley Shaori
    SYMMETRY-BASEL, 2021, 13 (03):
  • [7] Classifying antibodies using flow cytometry data: Class prediction and class discovery
    Salganik, MP
    Milford, EL
    Hardie, DL
    Shaw, S
    Wand, MP
    BIOMETRICAL JOURNAL, 2005, 47 (05) : 740 - 754
  • [8] Statistical Class Prediction Method for Efficient Microarray Gene Expression Data Sample Classification
    Sheela, T.
    Rangarajan, Lalitha
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 73 - 78
  • [9] Classification and diagnostic prediction of cancers using gene microarray data analysis
    Osareh, Alireza
    Shadgar, Bita
    Journal of Applied Sciences, 2009, 9 (03) : 459 - 468
  • [10] Prophet, a web-based tool for class prediction using microarray data
    Medina, Ignacio
    Montaner, David
    Tarraga, Joaquin
    Dopazo, Joaquin
    BIOINFORMATICS, 2007, 23 (03) : 390 - 391