Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations

被引:48
|
作者
Martin, Alicia R. [1 ,2 ,3 ]
Atkinson, Elizabeth G. [1 ,2 ,3 ]
Chapman, Sinead B. [2 ]
Stevenson, Anne [2 ,4 ]
Stroud, Rocky E. [2 ,4 ]
Abebe, Tamrat [5 ]
Akena, Dickens [6 ]
Alemayehu, Melkam [7 ]
Ashaba, Fred K. [8 ]
Atwoli, Lukoye [9 ]
Bowers, Tera [10 ]
Chibnik, Lori B. [2 ,4 ,11 ]
Daly, Mark J. [1 ,2 ,3 ,12 ]
DeSmet, Timothy [10 ]
Dodge, Sheila [10 ]
Fekadu, Abebaw [7 ,13 ]
Ferriera, Steven [10 ]
Gelaye, Bizu [4 ]
Gichuru, Stella [14 ]
Injera, Wilfred E. [15 ]
James, Roxanne [16 ]
Kariuki, Symon M. [17 ,18 ]
Kigen, Gabriel [19 ]
Koenen, Karestan C. [2 ,4 ]
Kwobah, Edith [14 ]
Kyebuzibwa, Joseph [6 ]
Majara, Lerato [16 ,20 ]
Musinguzi, Henry [8 ]
Mwema, Rehema M. [17 ]
Neale, Benjamin M. [1 ,2 ,3 ]
Newman, Carter P. [2 ,4 ]
Newton, Charles R. J. C. [17 ,18 ]
Pickrell, Joseph K. [21 ]
Ramesar, Raj [22 ]
Shiferaw, Welelta [5 ]
Stein, Dan J. [16 ,23 ,24 ]
Teferra, Solomon [7 ]
van der Merwe, Celia [1 ,2 ,3 ,16 ]
Zingela, Zukiswa [25 ]
机构
[1] Massachusetts Gen Hosp, Analyt & Translat Genet Unit, Boston, MA 02114 USA
[2] Broad Inst Harvard & MIT, Stanley Ctr Psychiat Res, Cambridge, MA 02142 USA
[3] Broad Inst Harvard & MIT, Program Med & Populat Genet, Cambridge, MA 02142 USA
[4] Harvard TH Chan Sch Publ Hlth, Dept Epidemiol, Boston, MA 02115 USA
[5] Addis Ababa Univ, Coll Hlth Sci, Sch Med, Dept Microbiol Immunol & Parasitol, Addis Ababa, Ethiopia
[6] Makerere Univ, Coll Hlth Sci, Sch Med, Dept Psychiat, Kampala, Uganda
[7] Addis Ababa Univ, Coll Hlth Sci, Sch Med, Dept Psychiat, Addis Ababa, Ethiopia
[8] Makerere Univ, Coll Hlth Sci, Dept Immunol & Mol Biol, Kampala, Uganda
[9] Moi Univ, Sch Med, Dept Mental Hlth, Coll Hlth Sci, Eldoret, Kenya
[10] Broad Inst MIT & Harvard, Broad Genom, 320 Charles St, Cambridge, MA 02141 USA
[11] Massachusetts Gen Hosp, Dept Neurol, Boston, MA 02114 USA
[12] Inst Mol Med Finland, Helsinki 00014, Finland
[13] Addis Ababa Univ, Ctr Innovat Drug Dev & Therapeut Trials Africa, Addis Ababa, Ethiopia
[14] Moi Teaching & Referral Hosp, Dept Mental Hlth, Eldoret, Kenya
[15] Moi Univ, Sch Med, Dept Immunol, Coll Hlth Sci, Eldoret, Kenya
[16] Univ Cape Town, Dept Psychiat & Mental Hlth, Cape Town, South Africa
[17] KEMRI Wellcome Trust Res Programme Coast, Neurosci Unit, Clin Dept, Kilifi, Kenya
[18] Univ Oxford, Dept Psychiat, Oxford OX3 7JX, England
[19] Moi Univ, Sch Med, Dept Pharmacol & Toxicol, Coll Hlth Sci, Eldoret, Kenya
[20] Univ Cape Town, Fac Hlth Sci, Inst Infect Dis & Mol Med, SA MRC Human Genet Res Unit,Div Human Genet, ZA-7925 Observatory, South Africa
[21] Gencove Inc, New York, NY 10016 USA
[22] Univ Cape Town, Inst Infect Dis & Mol Med, Dept Pathol, Div Human Genet,SA MRC Genom & Precis Med Res Uni, Cape Town, South Africa
[23] Univ Cape Town, SA MRC Unit Risk & Resilience Mental Disorders, Cape Town, South Africa
[24] Neuroscience Inst, Cape Town, South Africa
[25] Walter Sisulu Univ, Dept Psychiat & Human Behav Sci, Mthatha, South Africa
基金
英国医学研究理事会; 美国国家卫生研究院;
关键词
GENOTYPE-IMPUTATION; GENETIC ARCHITECTURE; GENOME; ASSOCIATION;
D O I
10.1016/j.ajhg.2021.03.012
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of >= 4x captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1x) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 43 sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.
引用
收藏
页码:656 / 668
页数:13
相关论文
共 31 条
  • [1] Cost-effectively dissecting the genetic architecture of complex wool traits in rabbits by low-coverage sequencing
    Wang, Dan
    Xie, Kerui
    Wang, Yanyan
    Hu, Jiaqing
    Li, Wenqiang
    Yang, Aiguo
    Zhang, Qin
    Ning, Chao
    Fan, Xinzhong
    GENETICS SELECTION EVOLUTION, 2022, 54 (01)
  • [2] Cost-effectively dissecting the genetic architecture of complex wool traits in rabbits by low-coverage sequencing
    Dan Wang
    Kerui Xie
    Yanyan Wang
    Jiaqing Hu
    Wenqiang Li
    Aiguo Yang
    Qin Zhang
    Chao Ning
    Xinzhong Fan
    Genetics Selection Evolution, 54
  • [3] Potential of Low-Coverage Genotyping-by-Sequencing and Imputation for Cost-Effective Genomic Selection in Biparental Segregating Populations
    Gorjanc, Gregor
    Dumasy, Jean-Francois
    Gonen, Serap
    Gaynor, R. Chris
    Antolin, Roberto
    Hickey, John M.
    CROP SCIENCE, 2017, 57 (03) : 1404 - 1420
  • [4] A Systematic Evaluation of Low-Coverage Whole Genome Sequencing Imputation across Human Populations
    Rubinacci, Simone
    Delaneau, Olivier
    HUMAN HEREDITY, 2021, 85 (02) : 90 - 91
  • [5] Best practices for genotype imputation from low-coverage sequencing data in natural populations
    Watowich, Marina M.
    Chiou, Kenneth L.
    Graves, Brian
    Montague, Michael J.
    Brent, Lauren J. N.
    Higham, James P.
    Horvath, Julie E.
    Lu, Amy
    Martinez, Melween I.
    Platt, Michael L.
    Schneider-Crease, India A.
    Lea, Amanda J.
    Snyder-Mackler, Noah
    MOLECULAR ECOLOGY RESOURCES, 2023,
  • [6] CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data
    Soylev, Arda
    Cokoglu, Sevim Seda
    Koptekin, Dilek
    Alkan, Can
    Somel, Mehmet
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (12)
  • [7] Genomic characterization of three Italian isolated populations through low-coverage whole genome sequencing
    Cocca, M.
    Mezzavilla, M.
    Barbieri, C.
    Brumat, M.
    Concas, M.
    Vuckovic, D.
    Robino, A.
    Gandin, I.
    Girotto, G.
    Sala, C.
    Gasparini, P.
    Soranzo, N.
    Toniolo, D.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 775 - 775
  • [8] A protocol for applying low-coverage whole-genome sequencing data in structural variation studies
    Liu, Qi
    Xie, Bo
    Gao, Yang
    Xu, Shuhua
    Lu, Yan
    STAR PROTOCOLS, 2023, 4 (03):
  • [9] Detecting inherited and novel structural variants in low-coverage parent-child sequencing data
    Spence, Melissa
    Banuelos, Mario
    Marcia, Roummel F.
    Sindi, Suzanne
    METHODS, 2020, 173 : 61 - 68
  • [10] A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data
    Miao Zhang
    Yiwen Liu
    Hua Zhou
    Joseph Watkins
    Jin Zhou
    BMC Bioinformatics, 22