Data structures for genome annotation, alternative splicing, and validation

被引:0
|
作者
Mielordt, Sven [1 ]
Grosse, Ivo
Kleffe, Juergen
机构
[1] Leibniz Inst Plant Genet & Crop Plant Res, IPK, D-06466 Gatersleben, Germany
[2] Charite Univ Med Berlin, UND Boinformat, Inst Mol Biol, D-14195 Berlin, Germany
关键词
gene and genome annotation; alternative splicing; data integration; splice template; validation and confirmation; quality control; Fasta-XML format;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
To establish a clean basis for studying alternative splicing and gene regulation in life science projects, a powerful data modeling and also a strict validation procedure for assigning levels of reliability to given gene models is essential. One common problem of public genome databases are insufficiently organized and linked description data, which make it difficult to study relations of the alternative isoforms of a gene that are relevant for medicine and plant genome research. This is a severe obstacle for the integration of biological data and motivated us to establish a new modeling instance and that we call splice template or sTMP. Every sTMP has a unique splicing pattern, but the length of the first and the last exon remains undefined. This allows to model different gene isoforms with the same splicing pattern. By utilizing this more fine-grained data structure, many cases of plurivalent mRNA-CDS relations are uncovered. There are more than 3,000 extra CDSs in the human genome compatible with the categories sTMP, mRNA and CDS, which exceed the classical one-to-one relations of mRNAs and CDSs. In one case, 11 extra CDSs are compatible with one mRNA. Crosslinks between mRNAs derived from different sTMPs leading to the same CDS are now accessible as well as disease-related ruptures in UTR regions. This allows discovering and validating disease and tissue specific differences in alternative splicing, gene expression and regulation. Another problem in public databases is a too much relaxed standard for labeling genes "confirmed by ESTs and full-length-cDNAs." We provide a pipeline that handles gene annotations from different sources, integrates them into complex gene models and assigns strict validation tags, constrained by a local low-error model for the alignments of genome annotation and transcripts. The data structures are being implemented and made publicly available at the Plant Data Warehouse of the Bioinformatics Center Gatersleben-Halle (http://portal.bic-gh.de/sTMP).
引用
收藏
页码:114 / 123
页数:10
相关论文
共 50 条
  • [1] ECgene: genome annotation for alternative splicing
    Kim, P
    Kim, N
    Lee, Y
    Kim, B
    Shin, Y
    Lee, S
    NUCLEIC ACIDS RESEARCH, 2005, 33 : D75 - D79
  • [2] The importance of identifying alternative splicing in vertebrate genome annotation
    Frankish, Adam
    Mudge, Jonathan M.
    Thomas, Mark
    Harrow, Jennifer
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2012,
  • [3] Improved genome annotation of Brassica oleracea highlights the importance of alternative splicing
    Yang, Yinqing
    Zhang, Lei
    Tang, Qi
    Zhang, Lingkui
    Li, Xing
    Chen, Shumin
    Zhang, Kang
    Li, Ying
    Hou, Xilin
    Cheng, Feng
    HORTICULTURAL PLANT JOURNAL, 2024, 10 (04) : 961 - 970
  • [4] Improved genome annotation of Brassica oleracea highlights the importance of alternative splicing
    Yinqing Yang
    Lei Zhang
    Qi Tang
    Lingkui Zhang
    Xing Li
    Shumin Chen
    Kang Zhang
    Ying Li
    Xilin Hou
    Feng Cheng
    Horticultural Plant Journal, 2024, 10 (04) : 961 - 970
  • [5] Manual correction of genome annotation improved alternative splicing identification of Artemisia annua
    Zhaoyu Liu
    Yupeng Du
    Zhihao Sun
    Bohan Cheng
    Zenghao Bi
    Zhicheng Yao
    Yuting Liang
    Huiling Zhang
    Run Yao
    Shen Kang
    Yuhua Shi
    Huihua Wan
    Dou Qin
    Li Xiang
    Liang Leng
    Shilin Chen
    Planta, 2023, 258
  • [6] Manual correction of genome annotation improved alternative splicing identification of Artemisia annua
    Liu, Zhaoyu
    Du, Yupeng
    Sun, Zhihao
    Cheng, Bohan
    Bi, Zenghao
    Yao, Zhicheng
    Liang, Yuting
    Zhang, Huiling
    Yao, Run
    Kang, Shen
    Shi, Yuhua
    Wan, Huihua
    Qin, Dou
    Xiang, Li
    Leng, Liang
    Chen, Shilin
    PLANTA, 2023, 258 (04)
  • [7] ASAP: the Alternative Splicing Annotation Project
    Lee, C
    Atanelov, L
    Modrek, B
    Xing, Y
    NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 101 - 105
  • [8] Gene and alternative splicing annotation with AIR
    Florea, L
    Di Francesco, V
    Miller, J
    Turner, R
    Yao, A
    Harris, M
    Walenz, B
    Mobarry, C
    Merkulov, GV
    Charlab, R
    Dew, I
    Deng, ZM
    Istrail, S
    Li, P
    Sutton, G
    GENOME RESEARCH, 2005, 15 (01) : 54 - 66
  • [9] Functional annotation for alternative splicing to investigate disrupted splicing in cancer
    Hyung, Daejin
    Kim, Jihyun
    Cho, Soo Young
    Park, Charny
    CANCER RESEARCH, 2018, 78 (13)
  • [10] Alternative splicing and genome complexity
    Brett, D
    Pospisil, H
    Valcárcel, J
    Reich, J
    Bork, P
    NATURE GENETICS, 2002, 30 (01) : 29 - 30