Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data

被引:285
|
作者
Kupershmidt, Ilya [1 ,2 ]
Su, Qiaojuan Jane [1 ]
Grewal, Anoop [1 ]
Sundaresh, Suman [1 ]
Halperin, Inbal [1 ]
Flynn, James [1 ]
Shekar, Mamatha [1 ]
Wang, Helen [1 ]
Park, Jenny [1 ]
Cui, Wenwu [1 ]
Wall, Gregory D. [1 ]
Wisotzkey, Robert [1 ]
Alag, Satnam [1 ]
Akhtari, Saeid [1 ]
Ronaghi, Mostafa [1 ,3 ]
机构
[1] NextBio, Cupertino, CA USA
[2] Royal Inst Technol KTH, Stockholm, Sweden
[3] Illumina, San Diego, CA USA
来源
PLOS ONE | 2010年 / 5卷 / 09期
关键词
GENE-EXPRESSION PROFILES; ADIPOCYTE DIFFERENTIATION; SKELETAL-MUSCLE; CANCER; BROWN; MICROARRAYS; SIGNATURES; IDENTIFICATION; PROBESETS; ARRAY;
D O I
10.1371/journal.pone.0013066
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today. Methodology/Results: We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. Conclusions: Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Exploratory analysis of high-throughput metabolomic data
    Chalini D. Wijetunge
    Zhaoping Li
    Isaam Saeed
    Jairus Bowne
    Arthur L. Hsu
    Ute Roessner
    Antony Bacic
    Saman K. Halgamuge
    Metabolomics, 2013, 9 : 1311 - 1320
  • [22] DATA: Diafiltration Apparatus for high-Throughput Analysis
    Ouimet, Jonathan A.
    Liu, Xinhong
    Brown, David J.
    Eugene, Elvis A.
    Popps, Tylar
    Muetzel, Zachary W.
    Dowling, Alexander W.
    Phillip, William A.
    JOURNAL OF MEMBRANE SCIENCE, 2022, 641
  • [23] Comprehensive analysis of high-throughput screening data
    Heyse, S
    BIOMEDICAL NANOTECHNOLOGY ARCHITECTURES AND APPLICATIONS, 2002, 4626 : 535 - 547
  • [24] Exploratory analysis of high-throughput metabolomic data
    Wijetunge, Chalini D.
    Li, Zhaoping
    Saeed, Isaam
    Bowne, Jairus
    Hsu, Arthur L.
    Roessner, Ute
    Bacic, Antony
    Halgamuge, Saman K.
    METABOLOMICS, 2013, 9 (06) : 1311 - 1320
  • [25] INFERING AN ONTOLOGY OF SINGLE CELL MOTIONS FROM HIGH-THROUGHPUT MICROSCOPY DATA
    Sebag, Alice Schoenauer
    Plancade, Sandra
    Raulet-Tomkiewicz, Celine
    Barouki, Robert
    Vert, Jean-Philippe
    Walter, Thomas
    2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), 2015, : 160 - 163
  • [26] The Mammalian Phenotype Ontology as a unifying standard for experimental and high-throughput phenotyping data
    Smith, Cynthia L.
    Eppig, Janan T.
    MAMMALIAN GENOME, 2012, 23 (9-10) : 653 - 668
  • [27] BioAssay Ontology Annotations Facilitate Cross-Analysis of Diverse High-Throughput Screening Data Sets
    Schuerer, Stephan C.
    Vempati, Uma
    Smith, Robin
    Southern, Mark
    Lemmon, Vance
    JOURNAL OF BIOMOLECULAR SCREENING, 2011, 16 (04) : 415 - 426
  • [28] The Mammalian Phenotype Ontology as a unifying standard for experimental and high-throughput phenotyping data
    Cynthia L. Smith
    Janan T. Eppig
    Mammalian Genome, 2012, 23 : 653 - 668
  • [29] nEASE: a method for gene ontology subclassification of high-throughput gene expression data
    Chittenden, Thomas W.
    Howe, Eleanor A.
    Taylor, Jennifer M.
    Mar, Jessica C.
    Aryee, Martin J.
    Gomez, Harold
    Sultana, Razvan
    Braisted, John
    Nair, Sarita J.
    Quackenbush, John
    Holmes, Chris
    BIOINFORMATICS, 2012, 28 (05) : 726 - 728
  • [30] Mapping Analysis in Ontology-Based Data Access: Algorithms and Complexity
    Lembo, Domenico
    Mora, Jose
    Rosati, Riccardo
    Savo, Domenico Fabio
    Thorstensen, Evgenij
    SEMANTIC WEB - ISWC 2015, PT I, 2015, 9366 : 217 - 234