Data structures for genome annotation, alternative splicing, and validation

被引:0
|
作者
Mielordt, Sven [1 ]
Grosse, Ivo
Kleffe, Juergen
机构
[1] Leibniz Inst Plant Genet & Crop Plant Res, IPK, D-06466 Gatersleben, Germany
[2] Charite Univ Med Berlin, UND Boinformat, Inst Mol Biol, D-14195 Berlin, Germany
关键词
gene and genome annotation; alternative splicing; data integration; splice template; validation and confirmation; quality control; Fasta-XML format;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
To establish a clean basis for studying alternative splicing and gene regulation in life science projects, a powerful data modeling and also a strict validation procedure for assigning levels of reliability to given gene models is essential. One common problem of public genome databases are insufficiently organized and linked description data, which make it difficult to study relations of the alternative isoforms of a gene that are relevant for medicine and plant genome research. This is a severe obstacle for the integration of biological data and motivated us to establish a new modeling instance and that we call splice template or sTMP. Every sTMP has a unique splicing pattern, but the length of the first and the last exon remains undefined. This allows to model different gene isoforms with the same splicing pattern. By utilizing this more fine-grained data structure, many cases of plurivalent mRNA-CDS relations are uncovered. There are more than 3,000 extra CDSs in the human genome compatible with the categories sTMP, mRNA and CDS, which exceed the classical one-to-one relations of mRNAs and CDSs. In one case, 11 extra CDSs are compatible with one mRNA. Crosslinks between mRNAs derived from different sTMPs leading to the same CDS are now accessible as well as disease-related ruptures in UTR regions. This allows discovering and validating disease and tissue specific differences in alternative splicing, gene expression and regulation. Another problem in public databases is a too much relaxed standard for labeling genes "confirmed by ESTs and full-length-cDNAs." We provide a pipeline that handles gene annotations from different sources, integrates them into complex gene models and assigns strict validation tags, constrained by a local low-error model for the alignments of genome annotation and transcripts. The data structures are being implemented and made publicly available at the Plant Data Warehouse of the Bioinformatics Center Gatersleben-Halle (http://portal.bic-gh.de/sTMP).
引用
收藏
页码:114 / 123
页数:10
相关论文
共 50 条
  • [41] Validation, analysis and annotation of cryo-EM structures
    Pintilie, Grigore
    Chiu, Wah
    ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2021, 77 : 1142 - 1152
  • [42] Identification and validation of critical alternative splicing events and splicing factors in gastric cancer progression
    Feng, Haoran
    Jin, Zhijian
    Liu, Kun
    Peng, Yi
    Jiang, Songyao
    Wang, Changgang
    Hu, Jiele
    Shen, Xiaoyun
    Qiu, Weihua
    Cheng, Xi
    Zhao, Ren
    JOURNAL OF CELLULAR AND MOLECULAR MEDICINE, 2020, 24 (21) : 12667 - 12680
  • [43] Conserved RNA secondary structures promote alternative splicing
    Shepard, Peter J.
    Hertel, Klemens J.
    RNA, 2008, 14 (08) : 1463 - 1469
  • [44] Structures, alternative splicing, and neurexin binding of multiple neuroligins
    Ichtchenko, K
    Nguyen, T
    Sudhof, TC
    JOURNAL OF BIOLOGICAL CHEMISTRY, 1996, 271 (05) : 2676 - 2682
  • [45] Alternative splicing and protein interaction data sets
    Talavera, David
    Robertson, David L.
    Lovell, Simon C.
    NATURE BIOTECHNOLOGY, 2013, 31 (04) : 292 - 293
  • [46] Proteomics Data Reveals Alternative Splicing Proteoforms
    Wu, Yi-Ying
    Zhang, Wei
    Kong, De-Zhi
    PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2024, 51 (12) : 3151 - 3162
  • [47] Alternative splicing and protein interaction data sets
    David Talavera
    David L Robertson
    Simon C Lovell
    Nature Biotechnology, 2013, 31 : 292 - 293
  • [48] Whole genome searching with shotgun proteomic data: Applications for genome annotation
    Sevinsky, Joel R.
    Cargile, Benjamin J.
    Bunger, Maureen K.
    Meng, Fanyu
    Yates, Nathan A.
    Hendrickson, Ronald C.
    Stephenson, James L., Jr.
    JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) : 80 - 88
  • [49] Genome-wide analysis of alternative splicing in Volvox carteri
    Arash Kianianmomeni
    Cheng Soon Ong
    Gunnar Rätsch
    Armin Hallmann
    BMC Genomics, 15
  • [50] Genome-wide mapping of alternative splicing in Arabidopsis thaliana
    Filichkin, Sergei A.
    Priest, Henry D.
    Givan, Scott A.
    Shen, Rongkun
    Bryant, Douglas W.
    Fox, Samuel E.
    Wong, Weng-Keen
    Mockler, Todd C.
    GENOME RESEARCH, 2010, 20 (01) : 45 - 58