LARGE-SCALE MULTIPLE INFERENCE OF COLLECTIVE DEPENDENCE WITH APPLICATIONS TO PROTEIN FUNCTION

被引:0
|
作者
Jernigan, Robert [1 ]
Jia, Kejue [1 ]
Ren, Zhao [2 ]
Zhou, Wen [3 ]
机构
[1] Iowa State Univ, Program Bioinformat & Computat Biol, Dept Biochem Biophys & Mol Biol, Ames, IA 50011 USA
[2] Univ Pittsburgh, Dept Stat, Pittsburgh, PA 15260 USA
[3] Colorado State Univ, Dept Stat, Ft Collins, CO 80523 USA
来源
ANNALS OF APPLIED STATISTICS | 2021年 / 15卷 / 02期
关键词
Collective dependence; false discovery rate; information theoretic measure; multiple testing; protein coevolution; structural biology; GAUSSIAN GRAPHICAL MODEL; FALSE DISCOVERY RATE; GENE-EXPRESSION; INFORMATION; ENTROPY; COEVOLUTION; VARIABILITY; NETWORK;
D O I
10.1214/20-AOAS1431
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Measuring the dependence of k >= 3 random variables and drawing inference from such higher-order dependences are scientifically important yet challenging. Motivated here by protein coevolution with multivariate categorical features, we consider an information theoretic measure of higher-order dependence. The proposed collective dependence is a symmetrization of differential interaction information which generalizes the mutual information of a pair of random variables. We show that the collective dependence can be easily estimated and facilitates a test on the dependence of k >= 3 random variables. Upon carefully exploring the null space of collective dependence, we devise a Classification-Assisted Large scaLe inference procedure to DEtect significant k-COllective DEpendence among d >= k random variables, with the false discovery rate controlled. Finite sample performance of our method is examined via simulations. We apply this method to the multiple protein sequence alignment data to study the residue or position coevolution for two protein families, the elongation factor P family and the zinc knuckle family. We identify novel functional triplets of amino acid residues, whose contributions to the protein function are further investigated. These confirm that the collective dependence does yield additional information important for understanding the protein coevolution compared to the pairwise measures.
引用
收藏
页码:902 / 924
页数:23
相关论文
共 50 条
  • [21] Large-Scale Simultaneous Inference with Hypothesis Testing: Multiple Testing Procedures in Practice
    Emmert-Streib, Frank
    Dehmer, Matthias
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2019, 1 (02): : 653 - 683
  • [22] PARALLEL PROCESSING OF LARGE-SCALE APPLICATIONS ON POWERFUL MULTIPLE PROCESSORS
    MORIARTY, KJM
    VONNEUMANN, J
    INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1989, 3 (01): : 82 - 87
  • [23] Large-scale covariate-assisted two-sample inference under dependence
    Wang, Pengfei
    Zhu, Wensheng
    SCANDINAVIAN JOURNAL OF STATISTICS, 2022, 49 (04) : 1421 - 1447
  • [24] Covariate-modulated large-scale multiple testing under dependence
    Wang, Jiangzhou
    Cui, Tingting
    Zhu, Wensheng
    Wang, Pengfei
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 180
  • [25] Scalable Algorithms for Bayesian Inference of Large-Scale Models from Large-Scale Data
    Ghattas, Omar
    Isaac, Tobin
    Petra, Noemi
    Stadler, Georg
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 3 - 6
  • [26] Large-scale prediction of protein structure and function from sequence
    Tosatto, S. C. E.
    Toppo, S.
    CURRENT PHARMACEUTICAL DESIGN, 2006, 12 (17) : 2067 - 2086
  • [27] Screening and selection methods for large-scale analysis of protein function
    Lin, HN
    Cornish, VW
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2002, 41 (23) : 4403 - 4425
  • [28] Smoothed quantile regression with large-scale inference
    He, Xuming
    Pan, Xiaoou
    Tan, Kean Ming
    Zhou, Wen-Xin
    JOURNAL OF ECONOMETRICS, 2023, 232 (02) : 367 - 388
  • [29] Large-scale inference of conjunctive Bayesian networks
    Montazeri, Hesam
    Kuipers, Jack
    Kouyos, Roger
    Boni, Jurg
    Yerly, Sabine
    Klimkait, Thomas
    Aubert, Vincent
    Gunthard, Huldrych F.
    Beerenwinkel, Niko
    BIOINFORMATICS, 2016, 32 (17) : 727 - 735
  • [30] Algorithm of OMA for large-scale orthology inference
    Roth, Alexander C. J.
    Gonnet, Gaston H.
    Dessimoz, Christophe
    BMC BIOINFORMATICS, 2008, 9 (1)