Variable selection for nonlinear dimensionality reduction of biological datasets through bootstrapping of correlation networks

被引:0
|
作者
Aragones, David G. [1 ,2 ]
Palomino-Segura, Miguel [3 ,4 ,5 ]
Sicilia, Jon [3 ]
Crainiciuc, Georgiana [3 ]
Ballesteros, Ivan [3 ]
Sanchez-Cabo, Fatima [6 ]
Hidalgo, Andres [7 ,8 ]
Calvo, Gabriel F. [1 ,2 ]
机构
[1] Univ Castilla La Mancha, Dept Math, Ciudad Real, Spain
[2] Univ Castilla La Mancha, MOLAB Math Oncol Lab, Ciudad Real, Spain
[3] Ctr Nacl Invest Cardiovasc Carlos III, Area Cell & Dev Biol, Madrid, Spain
[4] Inst Univ Invest Biosanitaria Extremadura INUBE, Immunophysiol Res Grp, Badajoz, Spain
[5] Univ Extremadura, Fac Sci, Dept Physiol, Badajoz, Spain
[6] Ctr Nacl Invest Cardiovasc Carlos III, Bioinformat Unit, Madrid, Spain
[7] Yale Univ, Sch Med, Vasc Biol & Therapeut Program, New Haven, CT USA
[8] Yale Univ, Dept Immunobiol, Sch Med, New Haven, CT USA
关键词
Artificial intelligence; Machine learning; Unsupervised learning; Feature selection; UMAP; Complex systems; SEQ DATA; CELL; MODEL;
D O I
10.1016/j.compbiomed.2023.107827
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identifying the most relevant variables or features in massive datasets for dimensionality reduction can lead to improved and more informative display, faster computation times, and more explainable models of complex systems. Despite significant advances and available algorithms, this task generally remains challenging, especially in unsupervised settings. In this work, we propose a method that constructs correlation networks using all intervening variables and then selects the most informative ones based on network bootstrapping. The method can be applied in both supervised and unsupervised scenarios. We demonstrate its functionality by applying Uniform Manifold Approximation and Projection for dimensionality reduction to several highdimensional biological datasets, derived from 4D live imaging recordings of hundreds of morpho-kinetic variables, describing the dynamics of thousands of individual leukocytes at sites of prominent inflammation. We compare our method with other standard ones in the field, such as Principal Component Analysis and Elastic Net, showing that it outperforms them. The proposed method can be employed in a wide range of applications, encompassing data analysis and machine learning.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Nonlinear Discriminative Dimensionality Reduction of Multiple Datasets
    Chen, Jia
    Wang, Gang
    Giannakis, Georgios B.
    2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 1993 - 1997
  • [2] Combining variable selection with dimensionality reduction
    Wolf, L
    Bileschi, S
    2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2005, : 801 - 806
  • [3] A biological model of nonlinear dimensionality reduction
    Yoshida, Kensuke
    Toyoizumi, Taro
    SCIENCE ADVANCES, 2025, 11 (06):
  • [4] Nonlinear dimensionality reduction of large datasets for data exploration
    Tomenko, V.
    Popov, V.
    DATA MINING VII: DATA, TEXT AND WEB MINING AND THEIR BUSINESS APPLICATIONS, 2006, 37 : 3 - +
  • [5] Nonlinear Dimensionality Reduction for Discriminative Analytics of Multiple Datasets
    Chen, Jia
    Wang, Gang
    Giannakis, Georgios B.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (03) : 740 - 752
  • [6] Biological pathway selection through nonlinear dimension reduction
    Zhu, Hongjie
    Li, Lexin
    BIOSTATISTICS, 2011, 12 (03) : 429 - 444
  • [7] Dynamic Neighborhood Selection for Nonlinear Dimensionality Reduction
    Zhan, Yubin
    Yin, Jianping
    Long, Jun
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5861 : 327 - 337
  • [8] Correlation Based Feature Selection Algorithms for Varying Datasets of Different Dimensionality
    Kowshalya, A. Meena
    Madhumathi, R.
    Gopika, N.
    WIRELESS PERSONAL COMMUNICATIONS, 2019, 108 (03) : 1977 - 1993
  • [9] Correlation Based Feature Selection Algorithms for Varying Datasets of Different Dimensionality
    A. Meena Kowshalya
    R. Madhumathi
    N. Gopika
    Wireless Personal Communications, 2019, 108 : 1977 - 1993
  • [10] A tractable latent variable model for nonlinear dimensionality reduction
    Saul, Lawrence K.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (27) : 15403 - 15408