An empirical comparison of dimensionality reduction methods for classifying gene and protein expression datasets

被引:0
|
作者
Lee, George [1 ]
Rodriguez, Carlos [2 ]
Madabhushi, Arlant [1 ]
机构
[1] Rutgers State Univ, Dept Biomed Engn, Piscataway, NJ 08854 USA
[2] Univ Puerto Rico, Mayaguez, PR 00681 USA
关键词
dimensionality reduction; bioinformatics; gene expression; proteomics; classification; prostate cancer; lung cancer; ovarian cancer; principal component analysis; linear discriminant analysis; multidimensional scaling; graph embedding; Isomap; locally linear embedding;
D O I
暂无
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The recent explosion in availability of gene and protein expression data for cancer detection has necessitated the development of sophisticated machine learning tools for high dimensional data analysis. Previous attempts at gene expression analysis have typically used a linear dimensionality reduction method such as Principal Components Analysis (PCA). Linear dimensionality reduction methods do not however account for the inherent nonlinearity within the data. The motivation behind this work is to demonstrate that nonlinear dimensionality reduction methods are more adept at capturing the nonlinearity within the data compared to linear methods, and hence would result in better classification and potentially aid in the visualization and identification of new data classes. Consequently, in this paper, we empirically compare the performance of 3 commonly used linear versus 3 nonlinear dimensionality reduction techniques from the perspective of (a) distinguishing objects belonging to cancer and non-cancer classes and (b) new class discovery in high dimensional gene and protein expression studies for different types of cancer. Quantitative evaluation using a support vector machine and a decision tree classifier revealed statistically significant improvement in classification accuracy by using nonlinear dimensionality reduction methods compared to linear methods.
引用
收藏
页码:170 / +
页数:3
相关论文
共 50 条
  • [21] A Comparison of Dimensionality Reduction Methods for Large Biological Data
    Babjac, Ashley
    Royalty, Taylor
    Steen, Andrew D.
    Emrich, Scott J.
    13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
  • [22] Comparison of dimensionality reduction methods for TCM symptom information
    Fu, Haoyang
    Li, Jingbo
    Liang, Likeng
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 1861 - 1865
  • [23] Enhanced Dimensionality Reduction Methods for Classifying Malaria Vector Dataset using Decision Tree
    Arowolo, Micheal Olaolu
    Adebiyi, Marion Olubunmi
    Adebiyi, Ayodele Ariyo
    SAINS MALAYSIANA, 2021, 50 (09): : 2579 - 2589
  • [24] Variation in large feeding biomechanics datasets visualized using different dimensionality reduction methods
    Orsbon, C. P.
    Nakamura, Y.
    Kijak, N. A.
    Palmer, S. E.
    Ross, C. F.
    INTEGRATIVE AND COMPARATIVE BIOLOGY, 2016, 56 : E345 - E345
  • [25] An empirical fuzzy multifactor dimensionality reduction method for detecting gene-gene interactions
    Sangseob Leem
    Taesung Park
    BMC Genomics, 18
  • [26] An empirical fuzzy multifactor dimensionality reduction method for detecting gene-gene interactions
    Leem, Sangseob
    Park, Taesung
    BMC GENOMICS, 2017, 18
  • [27] Comparison of Classification and Dimensionality Reduction Methods Used in fMRI Decoding
    Alamdari, Nasim T.
    Fatemizadeh, Emad
    2013 8TH IRANIAN CONFERENCE ON MACHINE VISION & IMAGE PROCESSING (MVIP 2013), 2013, : 175 - 179
  • [28] A Comparison of Dimensionality Reduction Methods Using Topology Preservation Indexes
    de Medeiros, Claudio J. F.
    Ferreira Costa, Jose Alfredo
    Silva, Leandro A.
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2011, 2011, 6936 : 437 - 445
  • [29] Dimensionality Reduction in Boolean Data: Comparison of Four BMF Methods
    Bartl, Eduard
    Belohlavek, Radim
    Osicka, Petr
    Rezankova, Hana
    CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 118 - 133
  • [30] Comparison between Two Dimensionality Reduction Methods in Time Series
    Zhang, Hanwen
    REVISTA COLOMBIANA DE ESTADISTICA, 2009, 32 (02): : 189 - 212