Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data

被引:285
|
作者
Kupershmidt, Ilya [1 ,2 ]
Su, Qiaojuan Jane [1 ]
Grewal, Anoop [1 ]
Sundaresh, Suman [1 ]
Halperin, Inbal [1 ]
Flynn, James [1 ]
Shekar, Mamatha [1 ]
Wang, Helen [1 ]
Park, Jenny [1 ]
Cui, Wenwu [1 ]
Wall, Gregory D. [1 ]
Wisotzkey, Robert [1 ]
Alag, Satnam [1 ]
Akhtari, Saeid [1 ]
Ronaghi, Mostafa [1 ,3 ]
机构
[1] NextBio, Cupertino, CA USA
[2] Royal Inst Technol KTH, Stockholm, Sweden
[3] Illumina, San Diego, CA USA
来源
PLOS ONE | 2010年 / 5卷 / 09期
关键词
GENE-EXPRESSION PROFILES; ADIPOCYTE DIFFERENTIATION; SKELETAL-MUSCLE; CANCER; BROWN; MICROARRAYS; SIGNATURES; IDENTIFICATION; PROBESETS; ARRAY;
D O I
10.1371/journal.pone.0013066
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today. Methodology/Results: We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. Conclusions: Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] categoryCompare: high-throughput data meta-analysis using gene annotations
    Robert M Flight
    Jeffrey C Petruska
    Benjamin J Harrison
    Eric C Rouchka
    BMC Bioinformatics, 12
  • [2] categoryCompare: high-throughput data meta-analysis using gene annotations
    Flight, Robert M.
    Petruska, Jeffrey C.
    Harrison, Benjamin J.
    Rouchka, Eric C.
    BMC BIOINFORMATICS, 2011, 12
  • [3] DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis
    Sheng, Quanhu
    Shyr, Yu
    Chen, Xi
    BMC BIOINFORMATICS, 2014, 15
  • [4] DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis
    Quanhu Sheng
    Yu Shyr
    Xi Chen
    BMC Bioinformatics, 15
  • [5] A Framework for Evaluating Field-Based, High-Throughput Phenotyping Systems: A Meta-Analysis
    Young, Sierra N.
    SENSORS, 2019, 19 (16)
  • [6] Statistical methods for the analysis of high-throughput data based on functional profiles derived from the Gene Ontology
    Sanchez, Alex
    Salicru, Miquel
    Ocana, Jordi
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2007, 137 (12) : 3975 - 3989
  • [7] A Framework for Analysis of Ontology-Based Data Access
    Konys, Agnieszka
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2016, PT II, 2016, 9876 : 397 - 408
  • [8] Using the BioAssay Ontology for Analyzing High-Throughput Screening Data
    Balderud, Linda Zander
    Murray, David
    Larsson, Niklas
    Vempati, Uma
    Schuerer, Stephan C.
    Bjareland, Marcus
    Engkvist, Ola
    JOURNAL OF BIOMOLECULAR SCREENING, 2015, 20 (03) : 402 - 415
  • [9] A novel meta-analysis method exploiting consistency of high-throughput experiments
    Rajaram, Satwik
    BIOINFORMATICS, 2009, 25 (05) : 636 - 642
  • [10] DSLE2 random-effects meta-analysis model for high-throughput methylation data
    Wang, Nan
    Zhou, Yang
    Zhu, Fengping
    Jin, Shuilin
    BMC GENOMICS, 2025, 26 (01):