ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

被引:10475
|
作者
Wang, Kai [1 ]
Li, Mingyao [2 ]
Hakonarson, Hakon [1 ,3 ]
机构
[1] Childrens Hosp Philadelphia, Ctr Appl Genom, Philadelphia, PA 19104 USA
[2] Univ Penn, Dept Biostat & Epidemiol, Philadelphia, PA 19104 USA
[3] Univ Penn, Dept Pediat, Philadelphia, PA 19104 USA
关键词
SNPS; ASSOCIATION; GENOMES;
D O I
10.1093/nar/gkq603
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires similar to 4 min to perform gene-based annotation and similar to 15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] High-throughput muscle fiber typing from RNA sequencing data
    Nikolay Oskolkov
    Malgorzata Santel
    Hemang M. Parikh
    Ola Ekström
    Gray J. Camp
    Eri Miyamoto-Mikami
    Kristoffer Ström
    Bilal Ahmad Mir
    Dmytro Kryvokhyzha
    Mikko Lehtovirta
    Hiroyuki Kobayashi
    Ryo Kakigi
    Hisashi Naito
    Karl-Fredrik Eriksson
    Björn Nystedt
    Noriyuki Fuku
    Barbara Treutlein
    Svante Pääbo
    Ola Hansson
    Skeletal Muscle, 12
  • [32] VNTRseek--a computational tool to detect tandem repeat variants in high-throughput sequencing data
    Gelfand, Yevgeniy
    Hernandez, Yozen
    Loving, Joshua
    Benson, Gary
    NUCLEIC ACIDS RESEARCH, 2014, 42 (14) : 8884 - 8894
  • [33] Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm
    Magi, Alberto
    Benelli, Matteo
    Yoon, Seungtai
    Roviello, Franco
    Torricelli, Francesca
    NUCLEIC ACIDS RESEARCH, 2011, 39 (10) : e65
  • [34] HIGH-THROUGHPUT FUNCTIONAL ANNOTATION OF ULTRA-RARE SCHIZOPHRENIA RISK VARIANTS THROUGH CRISPR KNOCKOUT SCREENING
    Dominicus, Caia
    Fischer, Lea
    Cooper, Sarah
    Feng, Claudia
    Gouda, Mahesh
    Salazar, Melissa
    Schulze, Thomas G.
    Trynka, Gosia
    Parts, Leopold
    Bassett, Andrew
    Schulte, Eva
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2024, 87 : 149 - 149
  • [35] An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments
    Duitama, Jorge
    Quintero, Juan Camilo
    Cruz, Daniel Felipe
    Quintero, Constanza
    Hubmann, Georg
    Foulquie-Moreno, Maria R.
    Verstrepen, Kevin J.
    Thevelein, Johan M.
    Tohme, Joe
    NUCLEIC ACIDS RESEARCH, 2014, 42 (06)
  • [36] WAVELET-BASED GENETIC ASSOCIATION ANALYSIS OF FUNCTIONAL PHENOTYPES ARISING FROM HIGH-THROUGHPUT SEQUENCING ASSAYS
    Shim, Heejung
    Stephens, Matthew
    ANNALS OF APPLIED STATISTICS, 2015, 9 (02): : 665 - 686
  • [37] mirTools 2.0 for non-coding RNA discovery, profiling and functional annotation based on high-throughput sequencing
    Wu, Jinyu
    Liu, Qi
    Wang, Xin
    Zheng, Jiayong
    Wang, Tao
    You, Mingcong
    Sun, Zhong Sheng
    Shi, Qinghua
    RNA BIOLOGY, 2013, 10 (07) : 1087 - 1092
  • [38] Comparison of high-throughput sequencing data compression tools
    Numanagic, Ibrahim
    Bonfield, James K.
    Hach, Faraz
    Voges, Jan
    Ostermann, Joern
    Alberti, Claudio
    Mattavelli, Marco
    Sahinalp, S. Cenk
    NATURE METHODS, 2016, 13 (12) : 1005 - +
  • [39] Need for speed in high-throughput sequencing data analysis
    Pluss, M.
    Caspar, S. M.
    Meienberg, J.
    Kopps, A. M.
    Keller, I.
    Bruggmann, R.
    Vogel, M.
    Matyas, G.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 721 - 722
  • [40] Genome variation discovery with high-throughput sequencing data
    Dalca, Adrian V.
    Brudno, Michael
    BRIEFINGS IN BIOINFORMATICS, 2010, 11 (01) : 3 - 14