The word landscape of the non-coding segments of the Arabidopsis thaliana genome

被引:23
|
作者
Lichtenberg, Jens [1 ]
Yilmaz, Alper [2 ]
Welch, Joshua D. [1 ]
Kurz, Kyle [1 ]
Liang, Xiaoyu [1 ]
Drews, Frank [1 ]
Ecker, Klaus [1 ]
Lee, Stephen S. [3 ]
Geisler, Matt [4 ]
Grotewold, Erich [2 ]
Welch, Lonnie R. [1 ,5 ,6 ]
机构
[1] Ohio Univ, Bioinformat Lab, Sch Elect Engn & Comp Sci, Athens, OH 45701 USA
[2] Ohio State Univ, Dept Plant Cellular & Mol Biol, Ctr Plant Biotechnol, Columbus, OH 43210 USA
[3] Univ Idaho, Dept Stat, Moscow, ID 83843 USA
[4] So Illinois Univ, Dept Plant Biol, Carbondale, IL 62901 USA
[5] Ohio Univ, Biomed Engn Program, Athens, OH 45701 USA
[6] Ohio Univ, Mol & Cellular Biol Program, Athens, OH 45701 USA
来源
BMC GENOMICS | 2009年 / 10卷
基金
美国国家科学基金会;
关键词
CIS-REGULATORY-ELEMENTS; FACTOR-BINDING SITES; 1ST INTRON CONTRIBUTE; GENE-EXPRESSION; TRANSCRIPTIONAL CONTROL; COMPUTATIONAL ANALYSIS; INFORMATION RESOURCE; IDENTIFICATION; DISCOVERY; PROMOTERS;
D O I
10.1186/1471-2164-10-463
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression. Results: Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions ( 3' UTRs and 5' UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others. Conclusion: Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome.
引用
收藏
页数:31
相关论文
共 50 条
  • [31] Conservation of microstructure between a sequenced region of the genome of rice and multiple segments of the genome of Arabidopsis thaliana
    Mayer, K
    Murphy, G
    Tarchini, R
    Wambutt, R
    Volckaert, G
    Pohl, T
    Düsterhöft, A
    Stiekema, W
    Entian, KD
    Terryn, N
    Lemcke, K
    Haase, D
    Hall, CR
    van Dodeweerd, AM
    Tingey, SV
    Mewes, HW
    Bevan, MW
    Bancroft, I
    GENOME RESEARCH, 2001, 11 (07) : 1167 - 1174
  • [32] The landscape of long non-coding RNAs in cancer
    Niknafs, Yashar S.
    Iyer, Matthew K.
    Chinnaiyan, Arul M.
    CANCER RESEARCH, 2015, 75
  • [33] Signaling landscape of mitochondrial non-coding RNAs
    Mutharasu, Gnanavel
    Murugesan, Akshaya
    Kondamani, Saravnan
    Thiyagarajan, Ramesh
    Yli-Harja, Olli
    Kandhavelu, Meenakshisundaram
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2023, 41 (21): : 12016 - 12025
  • [34] The landscape of non-coding RNAs in the immunopathogenesis of Endometriosis
    Abbaszadeh, Mohammad
    Karimi, Mohammadreza
    Rajaei, Samira
    FRONTIERS IN IMMUNOLOGY, 2023, 14
  • [35] Genome-wide analysis of long non-coding RNAs under diel light exhibits role in floral development and the circadian clock in Arabidopsis thaliana
    Yadav, Vikash Kumar
    Sawant, Samir Vishwanath
    Yadav, Amrita
    Jalmi, Siddhi Kashinath
    Kerkar, Savita
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2022, 223 : 1693 - 1704
  • [36] Distribution of microsatellites in relation to coding sequences within the Arabidopsis thaliana genome
    Casacuberta, E
    Puigdomènech, P
    Monfort, A
    PLANT SCIENCE, 2000, 157 (01) : 97 - 104
  • [37] Decoding the non-coding genome: elucidating genetic risk outside the coding genome
    Barr, C. L.
    Misener, V. L.
    GENES BRAIN AND BEHAVIOR, 2016, 15 (01) : 187 - 204
  • [38] The non-coding genome in Autism Spectrum Disorders
    Dominguez-Alonso, S.
    Carracedo, A.
    Rodriguez-Fontenla, C.
    EUROPEAN JOURNAL OF MEDICAL GENETICS, 2023, 66 (06)
  • [39] CRISPR-ing the non-coding genome
    Montoliu, Lluis
    Fernandez, Almudena
    Josa, Santiago
    Jimenez, Rafael
    Cantero, Marta
    Fernandez, Julia
    Seruggia, Davide
    TRANSGENIC RESEARCH, 2016, 25 (02) : 202 - 202
  • [40] Glucocorticoids regulate the human non-coding genome
    Kwiat, Robert Ernest
    Tran, Thai
    Cao, Qilin
    Gadkari, Manasi
    Randazzo, Davide
    Franco, Luis M.
    JOURNAL OF IMMUNOLOGY, 2023, 210 (01):