Background: The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today. Methodology/Results: We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. Conclusions: Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.
机构:
Purdue Univ, Cytometry Labs, W Lafayette, IN 47907 USA
Purdue Univ, Weldon Sch Biomed Engn, W Lafayette, IN 47907 USAPurdue Univ, Cytometry Labs, W Lafayette, IN 47907 USA
Robinson, J. Paul
论文数: 引用数:
h-index:
机构:
Rajwa, Bartek
Patsekin, Valery
论文数: 0引用数: 0
h-index: 0
机构:
Purdue Univ, Cytometry Labs, W Lafayette, IN 47907 USAPurdue Univ, Cytometry Labs, W Lafayette, IN 47907 USA
Patsekin, Valery
Davisson, Vincent Jo
论文数: 0引用数: 0
h-index: 0
机构:Purdue Univ, Cytometry Labs, W Lafayette, IN 47907 USA
机构:
Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
Harvard Univ, Sch Med, Dept Genet, Drosophila RNAi Screening Ctr, Boston, MA 02115 USAHarvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
Hu, Yanhui
Kulkarni, Meghana
论文数: 0引用数: 0
h-index: 0
机构:
Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USAHarvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
Kulkarni, Meghana
论文数: 引用数:
h-index:
机构:
Roesel, Charles
Sopko, Richelle
论文数: 0引用数: 0
h-index: 0
机构:
Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USAHarvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
Sopko, Richelle
Mohr, Stephanie E.
论文数: 0引用数: 0
h-index: 0
机构:
Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
Harvard Univ, Sch Med, Dept Genet, Drosophila RNAi Screening Ctr, Boston, MA 02115 USAHarvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
Mohr, Stephanie E.
Perrimon, Norbert
论文数: 0引用数: 0
h-index: 0
机构:
Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
Harvard Univ, Sch Med, Dept Genet, Drosophila RNAi Screening Ctr, Boston, MA 02115 USA
Howard Hughes Med Inst, Boston, MA 02115 USAHarvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA