Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data

被引:44
|
作者
Matthews, Beverley B. [1 ]
dos Santos, Gilberto [1 ]
Crosby, Madeline A. [1 ]
Emmert, David B. [1 ]
St Pierre, Susan E. [1 ]
Gramates, L. Sian [1 ]
Zhou, Pinglei [1 ]
Schroeder, Andrew J. [1 ]
Falls, Kathleen [1 ]
Strelets, Victor [2 ]
Russo, Susan M. [1 ]
Gelbart, William M. [1 ]
机构
[1] Harvard Univ, Dept Mol & Cellular Biol, Cambridge, MA 02138 USA
[2] Indiana Univ, Dept Biol, Bloomington, IN 47405 USA
[3] Univ New Mexico, Dept Biol, Albuquerque, NM 87131 USA
来源
G3-GENES GENOMES GENETICS | 2015年 / 5卷 / 08期
基金
英国医学研究理事会; 美国国家卫生研究院;
关键词
transcriptome; alternative splice; IncRNA; transcription start site; exon junction; OPEN READING FRAMES; POLYCISTRONIC MESSENGER-RNA; MOLECULAR EVOLUTION; REFERENCE SEQUENCE; GENOME ANNOTATION; ENDOGENOUS SIRNAS; IDENTIFICATION; REVEALS; EXPRESSION; TRANSCRIPTS;
D O I
10.1534/g3.115.018929
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 39 UTRs (up to 15-18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated genemodels) vs. female-specific genes (less than 1%). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts.
引用
收藏
页码:1721 / 1736
页数:16
相关论文
共 50 条
  • [1] High-throughput in vivo screen in a glioma model in Drosophila melanogaster
    Jeibmann, Astrid
    Witte, Hanna
    Klaembt, Christian
    Paulus, Werner
    JOURNAL OF NEUROGENETICS, 2009, 23 : S6 - S7
  • [2] High-throughput in vivo screen in a glioma model in Drosophila melanogaster
    Jeibmann, Astrid
    Witte, Hanna
    Klaembt, Christian
    Paulus, Werner
    ACTA NEUROPATHOLOGICA, 2008, 116 (03) : 347 - 347
  • [3] Drosophila melanogaster as a High-Throughput Model for Host-Microbiota Interactions
    Trinder, Mark
    Daisley, Brendan A.
    Dube, Josh S.
    Reid, Gregor
    FRONTIERS IN MICROBIOLOGY, 2017, 8
  • [4] categoryCompare: high-throughput data meta-analysis using gene annotations
    Robert M Flight
    Jeffrey C Petruska
    Benjamin J Harrison
    Eric C Rouchka
    BMC Bioinformatics, 12
  • [5] categoryCompare: high-throughput data meta-analysis using gene annotations
    Flight, Robert M.
    Petruska, Jeffrey C.
    Harrison, Benjamin J.
    Rouchka, Eric C.
    BMC BIOINFORMATICS, 2011, 12
  • [6] Gene Model Annotations for Drosophila melanogaster: The Rule-Benders
    Crosby, Madeline A.
    Gramates, L. Sian
    dos Santos, Gilberto
    Matthews, Beverley B.
    St Pierre, Susan E.
    Zhou, Pinglei
    Schroeder, Andrew J.
    Falls, Kathleen
    Emmert, David B.
    Russo, Susan M.
    Gelbart, William M.
    G3-GENES GENOMES GENETICS, 2015, 5 (08): : 1737 - 1749
  • [7] High-resolution, high-throughput SNP mapping in Drosophila melanogaster
    Chen D.
    Ahlford A.
    Schnorrer F.
    Kalchhauser I.
    Fellner M.
    Viràgh E.
    Kiss I.
    Syvänen A.-C.
    Dickson B.J.
    Nature Methods, 2008, 5 (4) : 323 - 329
  • [8] A reductionist paradigm for high-throughput behavioural fingerprinting in Drosophila melanogaster
    Jones, Hannah
    Willis, Jenny A.
    Firth, Lucy C.
    Giachello, Carlo N. G.
    Gilestro, Giorgio F.
    ELIFE, 2023, 12
  • [9] High-resolution, high-throughput SNP mapping in Drosophila melanogaster
    Chen, Doris
    Ahlford, Annika
    Schnorrer, Frank
    Kalchhauser, Irene
    Fellner, Michaela
    Viragh, Erika
    Kiss, Istvan
    Syvanen, Ann-Christine
    Dickson, Barry J.
    NATURE METHODS, 2008, 5 (04) : 323 - 329
  • [10] Biolistics for high-throughput transformation and RNA interference in Drosophila melanogaster
    Yuen, Jenna L.
    Read, Scott A.
    Brubacher, John L.
    Singh, Aditi D.
    Whyard, Steven
    FLY, 2008, 2 (05)