The word landscape of the non-coding segments of the Arabidopsis thaliana genome

被引:23
|
作者
Lichtenberg, Jens [1 ]
Yilmaz, Alper [2 ]
Welch, Joshua D. [1 ]
Kurz, Kyle [1 ]
Liang, Xiaoyu [1 ]
Drews, Frank [1 ]
Ecker, Klaus [1 ]
Lee, Stephen S. [3 ]
Geisler, Matt [4 ]
Grotewold, Erich [2 ]
Welch, Lonnie R. [1 ,5 ,6 ]
机构
[1] Ohio Univ, Bioinformat Lab, Sch Elect Engn & Comp Sci, Athens, OH 45701 USA
[2] Ohio State Univ, Dept Plant Cellular & Mol Biol, Ctr Plant Biotechnol, Columbus, OH 43210 USA
[3] Univ Idaho, Dept Stat, Moscow, ID 83843 USA
[4] So Illinois Univ, Dept Plant Biol, Carbondale, IL 62901 USA
[5] Ohio Univ, Biomed Engn Program, Athens, OH 45701 USA
[6] Ohio Univ, Mol & Cellular Biol Program, Athens, OH 45701 USA
来源
BMC GENOMICS | 2009年 / 10卷
基金
美国国家科学基金会;
关键词
CIS-REGULATORY-ELEMENTS; FACTOR-BINDING SITES; 1ST INTRON CONTRIBUTE; GENE-EXPRESSION; TRANSCRIPTIONAL CONTROL; COMPUTATIONAL ANALYSIS; INFORMATION RESOURCE; IDENTIFICATION; DISCOVERY; PROMOTERS;
D O I
10.1186/1471-2164-10-463
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression. Results: Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions ( 3' UTRs and 5' UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others. Conclusion: Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] The word landscape of the non-coding segments of the Arabidopsis thaliana genome
    Jens Lichtenberg
    Alper Yilmaz
    Joshua D Welch
    Kyle Kurz
    Xiaoyu Liang
    Frank Drews
    Klaus Ecker
    Stephen S Lee
    Matt Geisler
    Erich Grotewold
    Lonnie R Welch
    BMC Genomics, 10
  • [2] Functions of long non-coding RNA in Arabidopsis thaliana
    Jampala, Preethi
    Garhewal, Akanksha
    Lodha, Mukesh
    PLANT SIGNALING & BEHAVIOR, 2021, 16 (09)
  • [3] Computational prediction of novel non-coding RNAs in Arabidopsis thaliana
    Dandan Song
    Yang Yang
    Bin Yu
    Binglian Zheng
    Zhidong Deng
    Bao-Liang Lu
    Xuemei Chen
    Tao Jiang
    BMC Bioinformatics, 10
  • [4] Mining Regulatory Elements in Non-coding Regions of Arabidopsis thaliana
    Li, Xi
    Wang, Dianhui
    COMPUTATIONAL SYSTEMS-BIOLOGY AND BIOINFORMATICS, 2010, 115 : 94 - 105
  • [5] Computational prediction of novel non-coding RNAs in Arabidopsis thaliana
    Song, Dandan
    Yang, Yang
    Yu, Bin
    Zheng, Binglian
    Deng, Zhidong
    Lu, Bao-Liang
    Chen, Xuemei
    Jiang, Tao
    BMC BIOINFORMATICS, 2009, 10
  • [6] Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana
    Zhang, LD
    Yuan, DJ
    Yu, SW
    Li, ZG
    Cao, YF
    Miao, ZQ
    Qian, HM
    Tang, KX
    BIOINFORMATICS, 2004, 20 (07) : 1081 - 1086
  • [7] Correlations of length distributions between non-coding and coding sequences of Arabidopsis thaliana
    Caldwell, Rachel
    Lin, Yan-Xia
    Zhang, Ren
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2008, : 72 - 77
  • [8] Purifying selection acts on coding and non-coding sequences of paralogous genes in Arabidopsis thaliana
    Robert D. Hoffmann
    Michael Palmgren
    BMC Genomics, 17
  • [9] Association of nucleoid proteins with coding and non-coding segments of the Escherichia coli genome
    Grainger, David C.
    Hurd, Douglas
    Goldberg, Martin D.
    Busby, Stephen J. W.
    NUCLEIC ACIDS RESEARCH, 2006, 34 (16) : 4642 - 4652
  • [10] Purifying selection acts on coding and non-coding sequences of paralogous genes in Arabidopsis thaliana
    Hoffmann, Robert D.
    Palmgren, Michael
    BMC GENOMICS, 2016, 17