Mutation saturation for fitness effects at human CpG sites

被引:16
作者
Agarwal, Ipsita [1 ]
Przeworski, Molly [1 ,2 ]
机构
[1] Columbia Univ, Dept Biol Sci, New York, NY 10027 USA
[2] Columbia Univ, Dept Syst Biol, New York, NY 10027 USA
基金
美国国家卫生研究院;
关键词
PROTEIN-TRUNCATING VARIANTS; POPULATION-GENETICS; SELECTION; POLYMORPHISM; INFERENCE; HISTORY; MODEL; AGE;
D O I
10.7554/eLife.71513
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Whole exome sequences have now been collected for millions of humans, with the related goals of identifying pathogenic mutations in patients and establishing reference repositories of data from unaffected individuals. As a result, we are approaching an important limit, in which datasets are large enough that, in the absence of natural selection, every highly mutable site will have experienced at least one mutation in the genealogical history of the sample. Here, we focus on CpG sites that are methylated in the germline and experience mutations to T at an elevated rate of similar to 10(-7) per site per generation; considering synonymous mutations in a sample of 390,000 individuals, similar to 99 % of such CpG sites harbor a C/T polymorphism. Methylated CpG sites provide a natural mutation saturation experiment for fitness effects: as we show, at nt sample sizes, not seeing a non-synonymous polymorphism is indicative of strong selection against that mutation. We rely on this idea in order to directly identify a subset of CpG transitions that are likely to be highly deleterious, including similar to 27 % of possible loss-of-function mutations, and up to 20 % of possible missense mutations, depending on the type of functional site in which they occur. Unlike methylated CpGs, most mutation types, with rates on the order of 10(-8) or 10(-9), remain very far from saturation. We discuss what these findings imply for interpreting the potential clinical relevance of mutations from their presence or absence in reference databases and for inferences about the fitness effects of new mutations.
引用
收藏
页数:23
相关论文
共 64 条
[1]   A method and server for predicting damaging missense mutations [J].
Adzhubei, Ivan A. ;
Schmidt, Steffen ;
Peshkin, Leonid ;
Ramensky, Vasily E. ;
Gerasimova, Anna ;
Bork, Peer ;
Kondrashov, Alexey S. ;
Sunyaev, Shamil R. .
NATURE METHODS, 2010, 7 (04) :248-249
[2]   An expanded sequence context model broadly explains variability in polymorphism levels across the human genome [J].
Aggarwala, Varun ;
Voight, Benjamin F. .
NATURE GENETICS, 2016, 48 (04) :349-+
[3]   Sequencing of 640,000 exomes identifies GPR75 variants associated with protection from obesity [J].
Akbari, Parsa ;
Gilani, Ankit ;
Sosina, Olukayode ;
Kosmicki, Jack A. ;
Khrimian, Lori ;
Fang, Yi-Ya ;
Persaud, Trikaldarshi ;
Garcia, Victor ;
Sun, Dylan ;
Li, Alexander ;
Mbatchou, Joelle ;
Locke, Adam E. ;
Benner, Christian ;
Verweij, Niek ;
Lin, Nan ;
Hossain, Sakib ;
Agostinucci, Kevin ;
Pascale, Jonathan, V ;
Dirice, Ercument ;
Dunn, Michael ;
Kraus, William E. ;
Shah, Svati H. ;
Chen, Yii-Der, I ;
Rotter, Jerome, I ;
Rader, Daniel J. ;
Melander, Olle ;
Still, Christopher D. ;
Mirshahi, Tooraj ;
Carey, David J. ;
Berumen-Campos, Jaime ;
Kuri-Morales, Pablo ;
Alegre-Diaz, Jesus ;
Torres, Jason M. ;
Emberson, Jonathan R. ;
Collins, Rory ;
Balasubramanian, Suganthi ;
Hawes, Alicia ;
Jones, Marcus ;
Zambrowicz, Brian ;
Murphy, Andrew J. ;
Paulding, Charles ;
Coppola, Giovanni ;
Overton, John D. ;
Reid, Jeffrey G. ;
Shuldiner, Alan R. ;
Cantor, Michael ;
Kang, Hyun M. ;
Abecasis, Goncalo R. ;
Karalis, Katia ;
Economides, Aris N. .
SCIENCE, 2021, 373 (6550)
[4]   Distortion of genealogical properties when the sample is very large [J].
Bhaskar, Anand ;
Clark, Andrew G. ;
Song, Yun S. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (06) :2385-2390
[5]   Phylogenetic shadowing of primate sequences to find functional regions of the human genome [J].
Boffelli, D ;
McAuliffe, J ;
Ovcharenko, D ;
Lewis, KD ;
Ovcharenko, I ;
Pachter, L ;
Rubin, EM .
SCIENCE, 2003, 299 (5611) :1391-1394
[6]   Assessing the evolutionary impact of amino acid mutations in the human genome [J].
Boyko, Adam R. ;
Williamson, Scott H. ;
Indap, Amit R. ;
Degenhardt, Jeremiah D. ;
Hernandez, Ryan D. ;
Lohmueller, Kirk E. ;
Adams, Mark D. ;
Schmidt, Steffen ;
Sninsky, John J. ;
Sunyaev, Shamil R. ;
White, Thomas J. ;
Nielsen, Rasmus ;
Clark, Andrew G. ;
Bustamante, Carlos D. .
PLOS GENETICS, 2008, 4 (05)
[7]  
Brooks LD, 2015, Nature, V526, P68, DOI DOI 10.1038/NATURE15393
[8]   The UK Biobank resource with deep phenotyping and genomic data [J].
Bycroft, Clare ;
Freeman, Colin ;
Petkova, Desislava ;
Band, Gavin ;
Elliott, Lloyd T. ;
Sharp, Kevin ;
Motyer, Allan ;
Vukcevic, Damjan ;
Delaneau, Olivier ;
O'Connell, Jared ;
Cortes, Adrian ;
Welsh, Samantha ;
Young, Alan ;
Effingham, Mark ;
McVean, Gil ;
Leslie, Stephen ;
Allen, Naomi ;
Donnelly, Peter ;
Marchini, Jonathan .
NATURE, 2018, 562 (7726) :203-+
[9]   Estimating the selective effects of heterozygous protein-truncating variants from human exome data [J].
Cassa, Christopher A. ;
Weghorn, Donate ;
Balick, Daniel J. ;
Jordan, Daniel M. ;
Nusinow, David ;
Samocha, Kaitlin E. ;
O'Donnell-Luria, Anne ;
MacArthur, Daniel G. ;
Daly, Mark J. ;
Beier, David R. ;
Sunyaev, Shamil R. .
NATURE GENETICS, 2017, 49 (05) :806-+
[10]   A brief history of human disease genetics [J].
Claussnitzer, Melina ;
Cho, Judy H. ;
Collins, Rory ;
Cox, Nancy J. ;
Dermitzakis, Emmanouil T. ;
Hurles, Matthew E. ;
Kathiresan, Sekar ;
Kenny, Eimear E. ;
Lindgren, Cecilia M. ;
MacArthur, Daniel G. ;
North, Kathryn N. ;
Plon, Sharon E. ;
Rehm, Heidi L. ;
Risch, Neil ;
Rotimi, Charles N. ;
Shendure, Jay ;
Soranzo, Nicole ;
McCarthy, Mark I. .
NATURE, 2020, 577 (7789) :179-189