The word landscape of the non-coding segments of the Arabidopsis thaliana genome

被引:23
|
作者
Lichtenberg, Jens [1 ]
Yilmaz, Alper [2 ]
Welch, Joshua D. [1 ]
Kurz, Kyle [1 ]
Liang, Xiaoyu [1 ]
Drews, Frank [1 ]
Ecker, Klaus [1 ]
Lee, Stephen S. [3 ]
Geisler, Matt [4 ]
Grotewold, Erich [2 ]
Welch, Lonnie R. [1 ,5 ,6 ]
机构
[1] Ohio Univ, Bioinformat Lab, Sch Elect Engn & Comp Sci, Athens, OH 45701 USA
[2] Ohio State Univ, Dept Plant Cellular & Mol Biol, Ctr Plant Biotechnol, Columbus, OH 43210 USA
[3] Univ Idaho, Dept Stat, Moscow, ID 83843 USA
[4] So Illinois Univ, Dept Plant Biol, Carbondale, IL 62901 USA
[5] Ohio Univ, Biomed Engn Program, Athens, OH 45701 USA
[6] Ohio Univ, Mol & Cellular Biol Program, Athens, OH 45701 USA
来源
BMC GENOMICS | 2009年 / 10卷
基金
美国国家科学基金会;
关键词
CIS-REGULATORY-ELEMENTS; FACTOR-BINDING SITES; 1ST INTRON CONTRIBUTE; GENE-EXPRESSION; TRANSCRIPTIONAL CONTROL; COMPUTATIONAL ANALYSIS; INFORMATION RESOURCE; IDENTIFICATION; DISCOVERY; PROMOTERS;
D O I
10.1186/1471-2164-10-463
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression. Results: Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions ( 3' UTRs and 5' UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others. Conclusion: Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome.
引用
收藏
页数:31
相关论文
共 50 条
  • [21] Mutation detection in the non-coding genome
    Schuelke, Markus
    MEDIZINISCHE GENETIK, 2021, 33 (02) : 119 - 120
  • [22] Deregulation of the non-coding genome in leukemia
    Teppo, Susanna
    Heinaniemi, Merja
    Lohi, Olli
    RNA BIOLOGY, 2017, 14 (07) : 827 - 830
  • [23] Non-coding genome functions in diabetes
    Cebola, Ines
    Pasquali, Lorenzo
    JOURNAL OF MOLECULAR ENDOCRINOLOGY, 2016, 56 (01) : R1 - R20
  • [24] Annotating non-coding regions of the genome
    Roger P. Alexander
    Gang Fang
    Joel Rozowsky
    Michael Snyder
    Mark B. Gerstein
    Nature Reviews Genetics, 2010, 11 : 559 - 571
  • [25] Shaping the Genome with Non-Coding RNAs
    Wang, Xue Q. D.
    Crutchley, Jennifer L.
    Dostie, Josee
    CURRENT GENOMICS, 2011, 12 (05) : 307 - 321
  • [26] The long non-coding RNA LINDA restrains cellular collapse following DNA damage in Arabidopsis thaliana
    Herbst, Josephine
    Nagy, Solveig Henriette
    Vercauteren, Ilse
    De Veylder, Lieven
    Kunze, Reinhard
    PLANT JOURNAL, 2023, 116 (05): : 1370 - 1384
  • [27] CAN OF SPINACH, a novel long non-coding RNA, affects iron deficiency responses in Arabidopsis thaliana
    Bakirbas, Ahmet
    Walker, Elsbeth L.
    FRONTIERS IN PLANT SCIENCE, 2022, 13
  • [28] Mining the coding and non-coding genome for cancer drivers
    Li, Jia
    Drubay, Damien
    Michiels, Stefan
    Gautheret, Daniel
    CANCER LETTERS, 2015, 369 (02) : 307 - 315
  • [29] Beyond the coding genome: non-coding mutations and cancer
    Walavalkar, Kaivalya
    Notani, Dimple
    FRONTIERS IN BIOSCIENCE-LANDMARK, 2020, 25 : 1828 - 1838
  • [30] Whole genome mutational landscape and characterization of non-coding and structural mutations in liver cancer
    Fujimoto, Akihiro
    Furuta, Mayuko
    Totoki, Yasushi
    Tsunoda, Tatsuhiko
    Kato, Mamoru
    Hiroki, Yamaue
    Kazuaki, Chayama
    Miyano, Satoru
    Aburatani, Hiroyuki
    Shibata, Tatsuhiro
    Nakagawa, Hidewaki
    GENES & GENETIC SYSTEMS, 2016, 91 (06) : 373 - 373