LARGE-SCALE MULTIPLE INFERENCE OF COLLECTIVE DEPENDENCE WITH APPLICATIONS TO PROTEIN FUNCTION

被引:0
|
作者
Jernigan, Robert [1 ]
Jia, Kejue [1 ]
Ren, Zhao [2 ]
Zhou, Wen [3 ]
机构
[1] Iowa State Univ, Program Bioinformat & Computat Biol, Dept Biochem Biophys & Mol Biol, Ames, IA 50011 USA
[2] Univ Pittsburgh, Dept Stat, Pittsburgh, PA 15260 USA
[3] Colorado State Univ, Dept Stat, Ft Collins, CO 80523 USA
来源
ANNALS OF APPLIED STATISTICS | 2021年 / 15卷 / 02期
关键词
Collective dependence; false discovery rate; information theoretic measure; multiple testing; protein coevolution; structural biology; GAUSSIAN GRAPHICAL MODEL; FALSE DISCOVERY RATE; GENE-EXPRESSION; INFORMATION; ENTROPY; COEVOLUTION; VARIABILITY; NETWORK;
D O I
10.1214/20-AOAS1431
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Measuring the dependence of k >= 3 random variables and drawing inference from such higher-order dependences are scientifically important yet challenging. Motivated here by protein coevolution with multivariate categorical features, we consider an information theoretic measure of higher-order dependence. The proposed collective dependence is a symmetrization of differential interaction information which generalizes the mutual information of a pair of random variables. We show that the collective dependence can be easily estimated and facilitates a test on the dependence of k >= 3 random variables. Upon carefully exploring the null space of collective dependence, we devise a Classification-Assisted Large scaLe inference procedure to DEtect significant k-COllective DEpendence among d >= k random variables, with the false discovery rate controlled. Finite sample performance of our method is examined via simulations. We apply this method to the multiple protein sequence alignment data to study the residue or position coevolution for two protein families, the elongation factor P family and the zinc knuckle family. We identify novel functional triplets of amino acid residues, whose contributions to the protein function are further investigated. These confirm that the collective dependence does yield additional information important for understanding the protein coevolution compared to the pairwise measures.
引用
收藏
页码:902 / 924
页数:23
相关论文
共 50 条
  • [1] Large-scale simultaneous inference under dependence
    Tian, Jinjin
    Chen, Xu
    Katsevich, Eugene
    Goeman, Jelle
    Ramdas, Aaditya
    SCANDINAVIAN JOURNAL OF STATISTICS, 2023, 50 (02) : 750 - 796
  • [2] Large-scale multiple testing under dependence
    Sun, Wenguang
    Cai, T. Tony
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2009, 71 : 393 - 424
  • [3] The dependence of the galaxy luminosity function on large-scale environment
    Mo, HJ
    Yang, XH
    van den Bosch, FC
    Jing, YP
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2004, 349 (01) : 205 - 212
  • [4] A large-scale protein-function database
    Rolf Apweiler
    Richard Armstrong
    Amos Bairoch
    Athel Cornish-Bowden
    Peter J Halling
    Jan-Hendrik S Hofmeyr
    Carsten Kettner
    Thomas S Leyh
    Johann Rohwer
    Dietmar Schomburg
    Christoph Steinbeck
    Keith Tipton
    Nature Chemical Biology, 2010, 6 : 785 - 785
  • [5] A large-scale protein-function database
    Apweiler, Rolf
    Armstrong, Richard
    Bairoch, Amos
    Cornish-Bowden, Athel
    Halling, Peter J.
    Hofmeyr, Jan-Hendrik S.
    Kettner, Carsten
    Leyh, Thomas S.
    Rohwer, Johann
    Schomburg, Dietmar
    Steinbeck, Christoph
    Tipton, Keith
    NATURE CHEMICAL BIOLOGY, 2010, 6 (11) : 785 - 785
  • [6] LARGE-SCALE NETWORK ANALYSIS WITH APPLICATIONS TO TRANSPORTATION, COMMUNICATION AND INFERENCE NETWORKS
    TEH, HH
    FOO, MF
    DISCRETE MATHEMATICS, 1988, 72 (1-3) : 347 - 353
  • [7] Successive Refinement in Large-Scale Computation: Expediting Model Inference Applications
    Esfahanizadeh, Homa
    Cohen, Alejandro
    Shamai, Shlomo
    Medard, Muriel
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2025, 73 : 811 - 826
  • [8] Communication-Efficient Distributed Multiple Testing for Large-Scale Inference
    Pournaderi, Mehrdad
    Xiang, Yu
    arXiv, 2022,
  • [9] A general approach to account for dependence in large-scale multiple testing
    Friguet, Chloe
    JOURNAL OF THE SFDS, 2012, 153 (02): : 100 - 122
  • [10] Large-Scale Network Lifetime Inference Based on Universal Scaling Function
    Liu, Yimeng
    Lu, Dan
    Sui, Shaobo
    Peng, Rui
    Li, Jihong
    Bai, Mingyang
    Zhang, Xiaoke
    Li, Daqing
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (13): : 23123 - 23139