Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive

被引:74
作者
Nellore, Abhinav [1 ,2 ,3 ]
Jaffe, Andrew E. [2 ,3 ,4 ,5 ]
Fortin, Jean-Philippe [2 ,3 ]
Alquicira-Hernandez, Jose [2 ,6 ]
Collado-Torres, Leonardo [2 ,3 ,4 ]
Wang, Siruo [2 ,7 ]
Phillips, Robert A., III [2 ,8 ]
Karbhari, Nishika [2 ,9 ]
Hansen, Kasper D. [2 ,3 ,10 ]
Langmead, Ben [1 ,2 ,3 ]
Leek, Jeffrey T. [2 ,3 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD USA
[3] Johns Hopkins Univ, Ctr Computat Biol, Baltimore, MD USA
[4] Lieber Inst Brain Dev, Johns Hopkins Med Campus, Baltimore, MD USA
[5] Johns Hopkins Univ, Dept Mental Hlth, Baltimore, MD USA
[6] Univ Nacl Autonoma Mexico, Undergrad Program Genome Sci, Mexico City, DF, Mexico
[7] Ctr Coll Danville, Dept Math & Comp Sci, Danville, KY USA
[8] Salisbury Univ, Dept Biol Sci, Salisbury, MD USA
[9] Univ Texas Austin, Dept Biol Sci, Austin, TX 78712 USA
[10] Johns Hopkins Univ, McKusick Nathans Inst Genet Med, Baltimore, MD USA
基金
英国科研创新办公室;
关键词
RNA-seq; Splicing; Intron; ACTIVATING MUTATIONS; ALK KINASE; GENE; EXPRESSION; TRANSCRIPTOME; IDENTIFICATION; ANNOTATION; RECEPTOR;
D O I
10.1186/s13059-016-1118-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Gene annotations, such as those in GENCODE, are derived primarily from alignments of spliced cDNA sequences and protein sequences. The impact of RNA-seq data on annotation has been confined to major projects like ENCODE and Illumina Body Map 2.0. Results: We aligned 21,504 Illumina-sequenced human RNA-seq samples from the Sequence Read Archive (SRA) to the human genome and compared detected exon-exon junctions with junctions in several recent gene annotations. We found 56,861 junctions (18.6%) in at least 1000 samples that were not annotated, and their expression associated with tissue type. Junctions well expressed in individual samples tended to be annotated. Newer samples contributed few novel well-supported junctions, with the vast majority of detected junctions present in samples before 2013. We compiled junction data into a resource called intropolis available at http://intropolis.rail.bio. We used this resource to search for a recently validated isoform of the ALK gene and characterized the potential functional implications of unannotated junctions with publicly available TRAP-seq data. Conclusions: Considering only the variation contained in annotation may suffice if an investigator is interested only in well-expressed transcript isoforms. However, genes that are not generally well expressed and nonetheless present in a small but significant number of samples in the SRA are likelier to be incompletely annotated. The rate at which evidence for novel junctions has been added to the SRA has tapered dramatically, even to the point of an asymptote. Now is perhaps an appropriate time to update incomplete annotations to include splicing present in the now-stable snapshot provided by the SRA.
引用
收藏
页数:14
相关论文
共 42 条
[1]  
[Anonymous], 2015, International Journal of Antennas and Propagation, DOI DOI 10.1155/2015/315721
[2]   BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata [J].
Barrett, Tanya ;
Clark, Karen ;
Gevorgyan, Robert ;
Gorelenkov, Vyacheslav ;
Gribov, Eugene ;
Karsch-Mizrachi, Ilene ;
Kimelman, Michael ;
Pruitt, Kim D. ;
Resenchuk, Sergei ;
Tatusova, Tatiana ;
Yaschenko, Eugene ;
Ostell, James .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D57-D63
[3]   GenBank: update [J].
Benson, DA ;
Karsch-Mizrachi, I ;
Lipman, DJ ;
Ostell, J ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D23-D26
[4]   The NIH Roadmap Epigenomics Mapping Consortium [J].
Bernstein, Bradley E. ;
Stamatoyannopoulos, John A. ;
Costello, Joseph F. ;
Ren, Bing ;
Milosavljevic, Aleksandar ;
Meissner, Alexander ;
Kellis, Manolis ;
Marra, Marco A. ;
Beaudet, Arthur L. ;
Ecker, Joseph R. ;
Farnham, Peggy J. ;
Hirst, Martin ;
Lander, Eric S. ;
Mikkelsen, Tarjei S. ;
Thomson, James A. .
NATURE BIOTECHNOLOGY, 2010, 28 (10) :1045-1048
[5]   Oncogenic mutations of ALK kinase in neuroblastoma [J].
Chen, Yuyan ;
Takita, Junko ;
Choi, Young Lim ;
Kato, Motohiro ;
Ohira, Miki ;
Sanada, Masashi ;
Wang, Lili ;
Soda, Manabu ;
Kikuchi, Akira ;
Igarashi, Takashi ;
Nakagawara, Akira ;
Hayashi, Yasuhide ;
Mano, Hiroyuki ;
Ogawa, Seishi .
NATURE, 2008, 455 (7215) :971-U56
[6]   Polymorphic Cis- and Trans-Regulation of Human Gene Expression [J].
Cheung, Vivian G. ;
Nayak, Renuka R. ;
Wang, Isabel Xiaorong ;
Elwyn, Susannah ;
Cousins, Sarah M. ;
Morley, Michael ;
Spielman, Richard S. .
PLOS BIOLOGY, 2010, 8 (09)
[7]  
Consortium EP, 2011, PLOS BIOL, V9
[8]   Ensembl 2015 [J].
Cunningham, Fiona ;
Amode, M. Ridwan ;
Barrell, Daniel ;
Beal, Kathryn ;
Billis, Konstantinos ;
Brent, Simon ;
Carvalho-Silva, Denise ;
Clapham, Peter ;
Coates, Guy ;
Fitzgerald, Stephen ;
Gil, Laurent ;
Giron, Carlos Garcia ;
Gordon, Leo ;
Hourlier, Thibaut ;
Hunt, Sarah E. ;
Janacek, Sophie H. ;
Johnson, Nathan ;
Juettemann, Thomas ;
Kaehaeri, Andreas K. ;
Keenan, Stephen ;
Martin, Fergal J. ;
Maurel, Thomas ;
McLaren, William ;
Murphy, Daniel N. ;
Nag, Rishi ;
Overduin, Bert ;
Parker, Anne ;
Patricio, Mateus ;
Perry, Emily ;
Pignatelli, Miguel ;
Riat, Harpreet Singh ;
Sheppard, Daniel ;
Taylor, Kieron ;
Thormann, Anja ;
Vullo, Alessandro ;
Wilder, Steven P. ;
Zadissa, Amonida ;
Aken, Bronwen L. ;
Birney, Ewan ;
Harrow, Jennifer ;
Kinsella, Rhoda ;
Muffato, Matthieu ;
Ruffier, Magali ;
Searle, Stephen M. J. ;
Spudich, Giulietta ;
Trevanion, Stephen J. ;
Yates, Andy ;
Zerbino, Daniel R. ;
Flicek, Paul .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D662-D669
[9]   The Ensembl automatic gene annotation system [J].
Curwen, V ;
Eyras, E ;
Andrews, TD ;
Clarke, L ;
Mongin, E ;
Searle, SMJ ;
Clamp, M .
GENOME RESEARCH, 2004, 14 (05) :942-950
[10]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21