Analysis of complexity indices for classification problems: Cancer gene expression data

被引:41
|
作者
Lorena, Ana C.
Costa, Ivan G. [1 ]
Spolaor, Newton
de Souto, Marcilio C. P. [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, Recife, PE, Brazil
关键词
Classification; Gene expression data; Complexity indices; Linear separability; BREAST-CANCER; MICROARRAY; SENSITIVITY; PREDICTION; ALGORITHMS; SELECTION; RANKING;
D O I
10.1016/j.neucom.2011.03.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, cancer diagnosis at a molecular level has been made possible through the analysis of gene expression data. More specifically, one usually uses machine learning (ML) techniques to build, from cancer gene expression data, automatic diagnosis models (classifiers). Cancer gene expression data often present some characteristics that can have a negative impact in the generalization ability of the classifiers generated. Some of these properties are data sparsity and an unbalanced class distribution. We investigate the results of a set of indices able to extract the intrinsic complexity information from the data. Such measures can be used to analyze, among other things, which particular characteristics of cancer gene expression data mostly impact the prediction ability of support vector machine classifiers. In this context, we also show that, by applying a proper feature selection procedure to the data, one can reduce the influence of those characteristics in the error rates of the classifiers induced. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:33 / 42
页数:10
相关论文
共 50 条
  • [1] Dataset complexity and gene expression based cancer classification
    Okun, Oleg
    Priisalu, Helen
    APPLICATIONS OF FUZZY SETS THEORY, 2007, 4578 : 484 - +
  • [2] Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets
    Costa, Ivan G.
    Lorena, Ana C.
    Peres, Liciana R. M. P. y
    de Souto, Marcilio C. P.
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2009, 5676 : 48 - +
  • [3] Cancer classification using gene expression data
    Lu, Y
    Han, JW
    INFORMATION SYSTEMS, 2003, 28 (04) : 243 - 268
  • [4] Cancer Classification Using Gene Expression Data
    Sonsare, Pravinkumar
    Mujumdar, Aarya
    Joshi, Pranjali
    Morayya, Nipun
    Hablani, Sachal
    Khergade, Vedant
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 1, SMARTCOM 2024, 2024, 945 : 1 - 11
  • [5] Analysis of Gene Expression Cancer Data Set: Classification of TCGA Pan-cancer HiSeq Data
    Nitta, Yusaku
    Borders, Mitchell
    Ludwig, Simone A.
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 4745 - 4752
  • [6] A genetic filter for cancer classification on gene expression data
    Kim, Yong-Hyuk
    Yoon, Yourim
    BIO-MEDICAL MATERIALS AND ENGINEERING, 2015, 26 : S1993 - S2002
  • [7] Feature Selection and Classification in gene expression cancer data
    Pavithra, D.
    Lakshmanan, B.
    2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,
  • [8] Classification of Cancer Types based on Gene Expression Data
    He, Yinchao
    Bockmon, Ryan
    Modey, Miracle
    Roscoe, Sarah
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2175 - 2182
  • [9] Measurement of Data Complexity for Classification Problems with Unbalanced Data
    Anwar, Nafees
    Jones, Geoff
    Ganesh, Siva
    STATISTICAL ANALYSIS AND DATA MINING, 2014, 7 (03) : 194 - 211
  • [10] Cancer Classification Analysis for Microarray Gene Expression Data by Integrating Wavelet Transform and Visual Analysis
    Ji, Soo-Yeon
    Jeong, Dong Hyun
    2020 IEEE 20TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2020), 2020, : 17 - 22