An Optimal Bahadur-Efficient Method in Detection of Sparse Signals with Applications to Pathway Analysis in Sequencing Association Studies

被引:1
|
作者
Dai, Hongying [1 ,2 ]
Wu, Guodong [3 ]
Wu, Michael [4 ]
Zhi, Degui [5 ]
机构
[1] Childrens Mercy Hosp, Hlth Serv & Outcomes Res, Kansas City, MO 64108 USA
[2] Univ Missouri, Dept Biomed & Hlth Informat, Kansas City, MO 64110 USA
[3] Lovelace Resp Res Inst, Albuquerque, NM USA
[4] Fred Hutchinson Canc Res Ctr, Div Publ Hlth Sci, Biostat & Biomath Program, 1124 Columbia St, Seattle, WA 98104 USA
[5] Univ Alabama Birmingham, Dept Biostat, Birmingham, AL 35294 USA
来源
PLOS ONE | 2016年 / 11卷 / 07期
基金
美国国家卫生研究院;
关键词
WEIGHTED Z-TEST; COMBINING PROBABILITIES; ASYMPTOTIC OPTIMALITY; BIOLOGICAL PATHWAYS; INDEPENDENT TESTS; FISHERS METHOD; MODELS;
D O I
10.1371/journal.pone.0152667
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next-generation sequencing data pose a severe curse of dimensionality, complicating traditional "single marker-single trait" analysis. We propose a two-stage combined p-value method for pathway analysis. The first stage is at the gene level, where we integrate effects within a gene using the Sequence Kernel Association Test (SKAT). The second stage is at the pathway level, where we perform a correlated Lancaster procedure to detect joint effects from multiple genes within a pathway. We show that the Lancaster procedure is optimal in Bahadur efficiency among all combined p-value methods. The Bahadur efficiency, lim(epsilon -> 0) N-(2)/N-(1) = phi(12)(theta), compares sample sizes among different statistical tests when signals become sparse in sequencing data, i.e. epsilon -> 0. The optimal Bahadur efficiency ensures that the Lancaster procedure asymptotically requires a minimal sample size to detect sparse signals (P-N(i) < epsilon -> 0). The Lancaster procedure can also be applied to meta-analysis. Extensive empirical assessments of exome sequencing data show that the proposed method outperforms Gene Set Enrichment Analysis (GSEA). We applied the competitive Lancaster procedure to meta-analysis data generated by the Global Lipids Genetics Consortium to identify pathways significantly associated with high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, and total cholesterol.
引用
收藏
页数:18
相关论文
共 11 条
  • [1] Optimal screening and discovery of sparse signals with applications to multistage high throughput studies
    Cai, T. Tony
    Sun, Wenguang
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2017, 79 (01) : 197 - 223
  • [2] Adaptive Fisher method detects dense and sparse signals in association analysis of SNV sets
    Xiaoyu Cai
    Lo-Bin Chang
    Jordan Potter
    Chi Song
    BMC Medical Genomics, 13
  • [3] Adaptive Fisher method detects dense and sparse signals in association analysis of SNV sets
    Cai, Xiaoyu
    Chang, Lo-Bin
    Potter, Jordan
    Song, Chi
    BMC MEDICAL GENOMICS, 2020, 13 (Suppl 5)
  • [4] A pathway analysis method for genome-wide association studies
    Shahbaba, Babak
    Shachaf, Catherine M.
    Yu, Zhaoxia
    STATISTICS IN MEDICINE, 2012, 31 (10) : 988 - 1000
  • [5] A Bayesian Analysis and Optimal Design for Association Studies Using Next-Generation Pooled Sequencing Data
    Liang, Wei E.
    Thomas, Duncan C.
    Conti, David V.
    GENETIC EPIDEMIOLOGY, 2012, 36 (02) : 124 - 125
  • [6] A Minimax Optimal Ridge-Type Set Test for Global Hypothesis With Applications in Whole Genome Sequencing Association Studies
    Liu, Yaowu
    Li, Zilin
    Lin, Xihong
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (538) : 897 - 908
  • [7] Analysis and Optimal Design for Association Studies Using Next-Generation Sequencing With Case-Control Pools
    Liang, Wei E.
    Thomas, Duncan C.
    Conti, David V.
    GENETIC EPIDEMIOLOGY, 2012, 36 (08) : 870 - 881
  • [8] An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies
    Wang, Lily
    Jia, Peilin
    Wolfinger, Russell D.
    Chen, Xi
    Grayson, Britney L.
    Aune, Thomas M.
    Zhao, Zhongming
    BIOINFORMATICS, 2011, 27 (05) : 686 - 692
  • [9] CoNet: Efficient Network Regression for Survival Analysis in Transcriptome-Wide Association Studies-With Applications to Studies of Breast Cancer
    Han, Jiayi
    Zhang, Liye
    Yan, Ran
    Ju, Tao
    Jin, Xiuyuan
    Wang, Shukang
    Yuan, Zhongshang
    Ji, Jiadong
    GENES, 2023, 14 (03)
  • [10] A Statistical Method for Alignment-Free Analysis of Sequencing Reads with Applications in Copy Number Determination and Plasmid Integration Detection
    Mols, Mart
    HUMAN HEREDITY, 2017, 83 (01) : 18 - 18