Analysis of complexity indices for classification problems: Cancer gene expression data

被引:41
|
作者
Lorena, Ana C.
Costa, Ivan G. [1 ]
Spolaor, Newton
de Souto, Marcilio C. P. [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, Recife, PE, Brazil
关键词
Classification; Gene expression data; Complexity indices; Linear separability; BREAST-CANCER; MICROARRAY; SENSITIVITY; PREDICTION; ALGORITHMS; SELECTION; RANKING;
D O I
10.1016/j.neucom.2011.03.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, cancer diagnosis at a molecular level has been made possible through the analysis of gene expression data. More specifically, one usually uses machine learning (ML) techniques to build, from cancer gene expression data, automatic diagnosis models (classifiers). Cancer gene expression data often present some characteristics that can have a negative impact in the generalization ability of the classifiers generated. Some of these properties are data sparsity and an unbalanced class distribution. We investigate the results of a set of indices able to extract the intrinsic complexity information from the data. Such measures can be used to analyze, among other things, which particular characteristics of cancer gene expression data mostly impact the prediction ability of support vector machine classifiers. In this context, we also show that, by applying a proper feature selection procedure to the data, one can reduce the influence of those characteristics in the error rates of the classifiers induced. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:33 / 42
页数:10
相关论文
共 50 条
  • [21] Classification using functional data analysis for temporal gene expression data
    Leng, XY
    Müller, HG
    BIOINFORMATICS, 2006, 22 (01) : 68 - 76
  • [22] Analysis of data complexity measures for classification
    Cano, Jose-Ramon
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (12) : 4820 - 4831
  • [23] Pathway-Informed Classification System (PICS) for Cancer Analysis Using Gene Expression Data
    Young, Michael R.
    Craft, David L.
    CANCER INFORMATICS, 2016, 15 : 151 - 161
  • [24] Directed indices for exploring gene expression data
    LeBlanc, M
    Kooperberg, C
    Grogan, TM
    Miller, TP
    BIOINFORMATICS, 2003, 19 (06) : 686 - 693
  • [25] Generalized discriminant analysis for tumor classification with gene expression data
    Yang, Wen-Hui
    Dai, Dao-Qing
    Yan, Hong
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 4322 - +
  • [26] Gene expression data classification with kernel principal component analysis
    Liu, ZQ
    Chen, DC
    Bensmail, H
    JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2005, (02): : 155 - 159
  • [27] Relative evolutionary hierarchical analysis for gene expression data classification
    Czajkowski, Marcin
    Kretowski, Marek
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'19), 2019, : 1156 - 1164
  • [28] Ant Colony Optimisation Classification for Gene Expression Data Analysis
    Schaefer, Gerald
    ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2009, 5908 : 463 - 469
  • [29] On the complexity of some data analysis problems
    A. V. Kel’manov
    Computational Mathematics and Mathematical Physics, 2010, 50 : 1941 - 1947
  • [30] On the complexity of some data analysis problems
    Kel'manov, A. V.
    COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2010, 50 (11) : 1941 - 1947