Inflated expectations: Rare-variant association analysis using public controls

被引:3
|
作者
Kim, Jung [1 ]
Karyadi, Danielle M. [1 ]
Hartley, Stephen W. [1 ]
Zhu, Bin [1 ]
Wang, Mingyi [2 ,3 ]
Wu, Dongjing [2 ,3 ]
Song, Lei [1 ]
Armstrong, Gregory T. [4 ]
Bhatia, Smita [5 ]
Robison, Leslie L. [4 ]
Yasui, Yutaka [4 ]
Carter, Brian [6 ]
Sampson, Joshua N. [1 ]
Freedman, Neal D. [1 ]
Goldstein, Alisa M. [1 ]
Mirabello, Lisa [1 ]
Chanock, Stephen J. [1 ]
Morton, Lindsay M. [1 ]
Savage, Sharon A. [1 ]
Stewart, Douglas R. [1 ]
机构
[1] NCI, Div Canc Epidemiol & Genet, Rockville, MD 20850 USA
[2] NCI, Div Canc Epidemiol & Genet, Canc Genom Res Lab, Rockville, MD USA
[3] Frederick Natl Lab Canc Res, Leidos Biomed Res Inc, Frederick, MD USA
[4] St Jude Childrens Res Hosp, Dept Epidemiol & Canc Control, Memphis, TN USA
[5] Univ Alabama Birmingham, Inst Canc Outcomes & Survivorship, Birmingham, AL USA
[6] Amer Canc Soc, Dept Populat Sci, Atlanta, GA USA
来源
PLOS ONE | 2023年 / 18卷 / 01期
关键词
DESIGN;
D O I
10.1371/journal.pone.0280951
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The use of publicly available sequencing datasets as controls (hereafter, "public controls") in studies of rare variant disease associations has great promise but can increase the risk of false-positive discovery. The specific factors that could contribute to inflated distribution of test statistics have not been systematically examined. Here, we leveraged both public controls, gnomAD v2.1 and several datasets sequenced in our laboratory to systematically investigate factors that could contribute to the false-positive discovery, as measured by lambda(Delta 95), a measure to quantify the degree of inflation in statistical significance. Analyses of datasets in this investigation found that 1) the significantly inflated distribution of test statistics decreased substantially when the same variant caller and filtering pipelines were employed, 2) differences in library prep kits and sequencers did not affect the false-positive discovery rate and, 3) joint vs. separate variant-calling of cases and controls did not contribute to the inflation of test statistics. Currently available methods do not adequately adjust for the high false-positive discovery. These results, especially if replicated, emphasize the risks of using public controls for rare-variant association tests in which individual-level data and the computational pipeline are not readily accessible, which prevents the use of the same variant-calling and filtering pipelines on both cases and controls. A plausible solution exists with the emergence of cloud-based computing, which can make it possible to bring containerized analytical pipelines to the data (rather than the data to the pipeline) and could avert or minimize these issues. It is suggested that future reports account for this issue and provide this as a limitation in reporting new findings based on studies that cannot practically analyze all data on a single pipeline.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] A Unified Mixed-Effects Model for Rare-Variant Association in Sequencing Studies
    Sun, Jianping
    Zheng, Yingye
    Hsu, Li
    GENETIC EPIDEMIOLOGY, 2013, 37 (04) : 334 - 344
  • [42] PreMeta: a tool to facilitate meta-analysis of rare-variant associations
    Tang, Zheng-Zheng
    Bunn, Paul
    Tao, Ran
    Liu, Zhouwen
    Lin, Dan-Yu
    BMC GENOMICS, 2017, 18
  • [43] Author Correction: Exome sequencing of Finnish isolates enhances rare-variant association power
    Adam E. Locke
    Karyn Meltz Steinberg
    Charleston W. K. Chiang
    Susan K. Service
    Aki S. Havulinna
    Laurel Stell
    Matti Pirinen
    Haley J. Abel
    Colby C. Chiang
    Robert S. Fulton
    Anne U. Jackson
    Chul Joo Kang
    Krishna L. Kanchi
    Daniel C. Koboldt
    David E. Larson
    Joanne Nelson
    Thomas J. Nicholas
    Arto Pietilä
    Vasily Ramensky
    Debashree Ray
    Laura J. Scott
    Heather M. Stringham
    Jagadish Vangipurapu
    Ryan Welch
    Pranav Yajnik
    Xianyong Yin
    Johan G. Eriksson
    Mika Ala-Korpela
    Marjo-Riitta Järvelin
    Minna Männikkö
    Hannele Laivuori
    Susan K. Dutcher
    Nathan O. Stitziel
    Richard K. Wilson
    Ira M. Hall
    Chiara Sabatti
    Aarno Palotie
    Veikko Salomaa
    Markku Laakso
    Samuli Ripatti
    Michael Boehnke
    Nelson B. Freimer
    Nature, 2019, 575 : E4 - E4
  • [44] Integrating external controls in case-control studies improves power for rare-variant tests
    Li, Yatong
    Lee, Seunggeun
    GENETIC EPIDEMIOLOGY, 2022, 46 (3-4) : 145 - 158
  • [45] Genome sequencing and comprehensive rare-variant analysis of 465 families with neurodevelopmental disorders
    Sanchis-Juan, Alba
    Megy, Karyn
    Stephens, Jonathan
    Ricaurte, Camila Armirola
    Dewhurst, Eleanor
    Low, Kayyi
    French, Courtney E.
    Grozeva, Detelina
    Stirrups, Kathleen
    Erwood, Marie
    McTague, Amy
    Penkett, Christopher J.
    Shamardina, Olga
    Tuna, Salih
    Daugherty, Louise C.
    Gleadall, Nicholas
    Duarte, Sofia T.
    Hedrera-Fernandez, Antonio
    Vogt, Julie
    Ambegaonkar, Gautam
    Chitre, Manali
    Josifova, Dragana
    Kurian, Manju A.
    Parker, Alasdair
    Rankin, Julia
    Reid, Evan
    Wakeling, Emma
    Wassmer, Evangeline
    Raymond, F. Lucy
    Carss, Keren J.
    AMERICAN JOURNAL OF HUMAN GENETICS, 2023, 110 (08) : 1343 - 1355
  • [46] PreMeta: a tool to facilitate meta-analysis of rare-variant associations
    Zheng-Zheng Tang
    Paul Bunn
    Ran Tao
    Zhouwen Liu
    Dan-Yu Lin
    BMC Genomics, 18
  • [47] Multi-SKAT: General framework to test for rare-variant association with multiple phenotypes
    Dutta, Diptavo
    Scott, Laura
    Boehnke, Michael
    Lee, Seunggeun
    GENETIC EPIDEMIOLOGY, 2019, 43 (01) : 4 - 23
  • [48] Beyond Rare-Variant Association Testing: Pinpointing Rare Causal Variants in Case-Control Sequencing Study
    Wan-Yu Lin
    Scientific Reports, 6
  • [49] An allelic-series rare-variant association test for candidate-gene discovery
    McCaw, Zachary R.
    O'Dushlaine, Colm
    Somineni, Hari
    Bereket, Michael
    Klein, Christoph
    Karaletsos, Theofanis
    Casale, Francesco Paolo
    Koller, Daphne
    Soare, Thomas W.
    AMERICAN JOURNAL OF HUMAN GENETICS, 2023, 110 (08) : 1330 - 1342
  • [50] RVMMAT: Rare-Variant Mixed Model Association Tests for Binary Traits in Structured and Related Samples
    Chen, Han
    Lin, Xihong
    GENETIC EPIDEMIOLOGY, 2016, 40 (07) : 627 - 627