Constrained Covariance Matrices With a Biologically Realistic Structure: Comparison of Methods for Generating High-Dimensional Gaussian Graphical Models

被引:5
|
作者
Emmert-Streib, Frank [1 ,2 ]
Tripathi, Shailesh [1 ,3 ,4 ]
Dehmer, Matthias [3 ,5 ]
机构
[1] Tampere Univ, Fac Informat Technol & Commun Sci, Predict Soc & Data Analyt Lab, Tampere, Finland
[2] Inst Biosci & Med Technol, Tampere, Finland
[3] Univ Appl Sci Upper Austria, Inst Intelligent Prod, Fac Management, Wels, Austria
[4] UMIT, Dept Mech & Biomed Comp Sci, Hall In Tirol, Austria
[5] Nankai Univ, Coll Comp & Control Engn, Tianjin, Peoples R China
关键词
Gaussian graphical models; network science; machine learning; data science; genomics; gene regulatory networks; statistics;
D O I
10.3389/fams.2019.00017
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
High-dimensional data from molecular biology possess an intricate correlation structure that is imposed by the molecular interactions between genes and their products forming various different types of gene networks. This fact is particularly well-known for gene expression data, because there is a sufficient number of large-scale data sets available that are amenable for a sensible statistical analysis confirming this assertion. The purpose of this paper is two fold. First, we investigate three methods for generating constrained covariance matrices with a biologically realistic structure. Such covariance matrices are playing a pivotal role in designing novel statistical methods for high-dimensional biological data, because they allow to define Gaussian graphical models (GGM) for the simulation of realistic data; including their correlation structure. We study local and global characteristics of these covariance matrices, and derived concentration/partial correlation matrices. Second, we connect these results, obtained from a probabilistic perspective, to statistical results of studies aiming to estimate gene regulatory networks frombiological data. This connection allows to shed light on the well-known heterogeneity of statistical estimation methods for inferring gene regulatory networks and provides an explanation for the difficulties inferring molecular interactions between highly connected genes.
引用
收藏
页数:15
相关论文
共 44 条
  • [41] Linear Convergence of Gradient Methods for Estimating Structured Transition Matrices in High-dimensional Vector Autoregressive Models
    Lv, Xiao
    Cui, Wei
    Liu, Yulong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [42] Comparison of the Performance of Machine Learning Models in Representing High-Dimensional Free Energy Surfaces and Generating Observables
    Cendagorta, Joseph R.
    Tolpin, Jocelyn
    Schneider, Elia
    Topper, Robert Q.
    Tuckerman, Mark E.
    JOURNAL OF PHYSICAL CHEMISTRY B, 2020, 124 (18): : 3647 - 3660
  • [43] Large-sample approximations and change testing for high-dimensional covariance matrices of multivariate linear time series and facto models
    Bours, Monika
    Steland, Ansgar
    SCANDINAVIAN JOURNAL OF STATISTICS, 2021, 48 (02) : 610 - 654
  • [44] Inferring Two-Level Hierarchical Gaussian Graphical Models to Discover Shared and Context-Specific Conditional Dependencies from High-Dimensional Heterogeneous Data
    Rahman M.S.
    Nicholson A.E.
    Haffari G.
    SN Computer Science, 2020, 1 (4)