A Comparison of Dimensionality Reduction Methods for Large Biological Data

被引:0
|
作者
Babjac, Ashley [1 ]
Royalty, Taylor [2 ]
Steen, Andrew D. [3 ]
Emrich, Scott J. [1 ]
机构
[1] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37996 USA
[2] Univ Tennessee, Dept Earth & Planetary Sci, Knoxville, TN USA
[3] Univ Tennessee, Dept Microbiol, Knoxville, TN 37996 USA
关键词
autoencoders; dimensionality reduction; classification;
D O I
10.1145/3535508.3545536
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale data often suffer from the curse of dimensionality and the constraints associated with it; therefore, dimensionality reduction methods are often performed prior to most machine learning pipelines. In this paper, we directly compare autoencoders performance as a dimensionality reduction technique (via the latent space) to other established methods: PCA, LASSO, and t-SNE. To do so, we use four distinct datasets that vary in the types of features, metadata, labels, and size to robustly compare different methods. We test prediction capability using both Support Vector Machines (SVM) and Random Forests (RF). Significantly, we conclude that autoencoders are an equivalent dimensionality reduction architecture to the previously established methods, and often outperform them in both prediction accuracy and time performance when condensing large, sparse datasets.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Dimensionality Reduction Methods for Brain Imaging Data Analysis
    Tang, Yunbo
    Chen, Dan
    Li, Xiaoli
    ACM COMPUTING SURVEYS, 2021, 54 (04)
  • [22] Review of classical dimensionality reduction and sample selection methods for large-scale data processing
    Xu, Xinzheng
    Liang, Tianming
    Zhu, Jiong
    Zheng, Dong
    Sun, Tongfeng
    NEUROCOMPUTING, 2019, 328 : 5 - 15
  • [23] Performance comparison of dimensionality reduction methods on RNA-Seq data from the GTEx project
    Ho-Sik Seok
    Genes & Genomics, 2020, 42 : 225 - 234
  • [24] Performance comparison of dimensionality reduction methods on RNA-Seq data from the GTEx project
    Seok, Ho-Sik
    GENES & GENOMICS, 2020, 42 (02) : 225 - 234
  • [25] Performance Comparison of Nonlinear Dimensionality Reduction Methods for Image Data Using Different Distance Measures
    Naseer, Mudasser
    Qin, Shi-Yin
    2008 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, VOLS 1 AND 2, PROCEEDINGS, 2008, : 41 - 46
  • [26] Dimensionality reduction for visualizing high-dimensional biological data
    Malepathirana, Tamasha
    Senanayake, Damith
    Vidanaarachchi, Rajith
    Gautam, Vini
    Halgamuge, Saman
    BIOSYSTEMS, 2022, 220
  • [27] Comparison of various dimensionality methods on the Sabalan megnetotelluric data
    Sarvandani, Mohamadhasan Mohamadian
    Nejati, Ali
    Ghaedrahmati, Reza
    JOURNAL OF APPLIED GEOPHYSICS, 2016, 128 : 179 - 190
  • [28] Comparison of Classification and Dimensionality Reduction Methods Used in fMRI Decoding
    Alamdari, Nasim T.
    Fatemizadeh, Emad
    2013 8TH IRANIAN CONFERENCE ON MACHINE VISION & IMAGE PROCESSING (MVIP 2013), 2013, : 175 - 179
  • [29] A Comparison of Dimensionality Reduction Methods Using Topology Preservation Indexes
    de Medeiros, Claudio J. F.
    Ferreira Costa, Jose Alfredo
    Silva, Leandro A.
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2011, 2011, 6936 : 437 - 445
  • [30] Empirical comparison between autoencoders and traditional dimensionality reduction methods
    Fournier, Quentin
    Aloise, Daniel
    2019 IEEE SECOND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE), 2019, : 211 - 214