LARGE-SCALE MULTIPLE INFERENCE OF COLLECTIVE DEPENDENCE WITH APPLICATIONS TO PROTEIN FUNCTION

被引:0
|
作者
Jernigan, Robert [1 ]
Jia, Kejue [1 ]
Ren, Zhao [2 ]
Zhou, Wen [3 ]
机构
[1] Iowa State Univ, Program Bioinformat & Computat Biol, Dept Biochem Biophys & Mol Biol, Ames, IA 50011 USA
[2] Univ Pittsburgh, Dept Stat, Pittsburgh, PA 15260 USA
[3] Colorado State Univ, Dept Stat, Ft Collins, CO 80523 USA
来源
ANNALS OF APPLIED STATISTICS | 2021年 / 15卷 / 02期
关键词
Collective dependence; false discovery rate; information theoretic measure; multiple testing; protein coevolution; structural biology; GAUSSIAN GRAPHICAL MODEL; FALSE DISCOVERY RATE; GENE-EXPRESSION; INFORMATION; ENTROPY; COEVOLUTION; VARIABILITY; NETWORK;
D O I
10.1214/20-AOAS1431
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Measuring the dependence of k >= 3 random variables and drawing inference from such higher-order dependences are scientifically important yet challenging. Motivated here by protein coevolution with multivariate categorical features, we consider an information theoretic measure of higher-order dependence. The proposed collective dependence is a symmetrization of differential interaction information which generalizes the mutual information of a pair of random variables. We show that the collective dependence can be easily estimated and facilitates a test on the dependence of k >= 3 random variables. Upon carefully exploring the null space of collective dependence, we devise a Classification-Assisted Large scaLe inference procedure to DEtect significant k-COllective DEpendence among d >= k random variables, with the false discovery rate controlled. Finite sample performance of our method is examined via simulations. We apply this method to the multiple protein sequence alignment data to study the residue or position coevolution for two protein families, the elongation factor P family and the zinc knuckle family. We identify novel functional triplets of amino acid residues, whose contributions to the protein function are further investigated. These confirm that the collective dependence does yield additional information important for understanding the protein coevolution compared to the pairwise measures.
引用
收藏
页码:902 / 924
页数:23
相关论文
共 50 条
  • [41] Large-scale prediction of function shift in protein families with a focus on enzymatic function
    Abhiman, S
    Sonnhammer, ELL
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 60 (04) : 758 - 768
  • [42] Efficient Heuristics for Placing Large-Scale Distributed Applications on Multiple Clouds
    Silva, Pedro
    Perez, Christian
    Desprez, Frederic
    2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 483 - 492
  • [43] Performance and Cost Optimization for Multiple Large-scale Grid Workflow Applications
    Duan, Rubing
    Prodan, Radu
    Fahringer, Thomas
    2007 ACM/IEEE SC07 CONFERENCE, 2010, : 500 - 511
  • [44] Optimizing Function Placement for Large-Scale Data-Center Applications
    Ottoni, Guilherme
    Maher, Bertrand
    CGO'17: PROCEEDINGS OF THE 2017 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2017, : 233 - 244
  • [45] Improved Large-Scale Synthesis of Acridonylalanine for Diverse Peptide and Protein Applications
    Marmorstein, Jason G.
    Pagar, Vinayak V.
    Hummingbird, Eshe
    Saleh, Ibrahim G.
    Phan, Hoang Anh T.
    Chang, Yanan
    Shaffer, Kyle D.
    Venkatesh, Yarra
    Dmochowski, Ivan J.
    Stebe, Kathleen J.
    Petersson, E. James
    BIOCONJUGATE CHEMISTRY, 2024,
  • [46] Recording of multiple videos in a large-scale space for large-scale virtualized reality
    Kitahara, Itaru
    Ohta, Yuichi
    Saito, Hideo
    Akimichi, Shinji
    Ono, Tooru
    Kanade, Takeo
    Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2002, 56 (08): : 1328 - 1333
  • [47] Large-Scale Conformational Changes and Protein Function: Breaking the in silico Barrier
    Orellana, Laura
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2019, 6
  • [48] DeepMFFGO: A Protein Function Prediction Method for Large-Scale Multifeature Fusion
    Wang, Jingfu
    Chen, Jiaying
    Hu, Yue
    Song, Chaolin
    Li, Xinhui
    Qian, Yurong
    Deng, Lei
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2025,
  • [49] Erratum to: Algorithm of OMA for large-scale orthology inference
    Alexander CJ Roth
    Gaston H Gonnet
    Christophe Dessimoz
    BMC Bioinformatics, 10
  • [50] Accelerating Large-Scale Inference with Anisotropic Vector Quantization
    Guo, Ruiqi
    Sun, Philip
    Lindgren, Erik
    Geng, Quan
    Simcha, David
    Chern, Felix
    Kumar, Sanjiv
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119