Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data

被引:285
|
作者
Kupershmidt, Ilya [1 ,2 ]
Su, Qiaojuan Jane [1 ]
Grewal, Anoop [1 ]
Sundaresh, Suman [1 ]
Halperin, Inbal [1 ]
Flynn, James [1 ]
Shekar, Mamatha [1 ]
Wang, Helen [1 ]
Park, Jenny [1 ]
Cui, Wenwu [1 ]
Wall, Gregory D. [1 ]
Wisotzkey, Robert [1 ]
Alag, Satnam [1 ]
Akhtari, Saeid [1 ]
Ronaghi, Mostafa [1 ,3 ]
机构
[1] NextBio, Cupertino, CA USA
[2] Royal Inst Technol KTH, Stockholm, Sweden
[3] Illumina, San Diego, CA USA
来源
PLOS ONE | 2010年 / 5卷 / 09期
关键词
GENE-EXPRESSION PROFILES; ADIPOCYTE DIFFERENTIATION; SKELETAL-MUSCLE; CANCER; BROWN; MICROARRAYS; SIGNATURES; IDENTIFICATION; PROBESETS; ARRAY;
D O I
10.1371/journal.pone.0013066
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today. Methodology/Results: We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. Conclusions: Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Dataset Constrution through Ontology-Based Data Requirements Analysis
    Jiang, Liangru
    Wang, Xi
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [32] Ontology-based Big Data Analysis for Orchid Smart Farming
    Kaewboonma, Nattapong
    Chansanam, Wirapong
    Buranarach, Marut
    LIBRES-LIBRARY AND INFORMATION SCIENCE RESEARCH ELECTRONIC JOURNAL, 2019, 29 (02): : 91 - 98
  • [33] An Ontology-based Framework to Support Multivariate Qualitative Data Analysis
    Roda, Fernando
    Musulin, Estanislao
    24TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING, PTS A AND B, 2014, 33 : 1891 - 1896
  • [34] Meta-analysis of global and high throughput public gene array data for robust vascular gene expression discovery in chronic rhinosinusitis: Implications in controlled release
    Khurana, Nitish
    Pulsipher, Abigail
    Ghandehari, Hamidreza
    Alt, Jeremiah A.
    JOURNAL OF CONTROLLED RELEASE, 2021, 330 : 878 - 888
  • [35] High-throughput data analysis and data integration for vaccine trials
    Weiner, January, III
    Kaufmann, Stefan H. E.
    Maertzdorf, Jeroen
    VACCINE, 2015, 33 (40) : 5249 - 5255
  • [36] The Application of Cheminformatics in the Analysis of High-Throughput Screening Data
    Walters, W. Patrick
    Aronov, Alexander
    Goldman, Brian
    McClain, Brian
    Perola, Emanuele
    Weiss, Jonathan
    FRONTIERS IN MOLECULAR DESIGN AND CHEMIAL INFORMATION SCIENCE - HERMAN SKOLNIK AWARD SYMPOSIUM 2015: JURGEN BAJORATH, 2016, 1222 : 269 - 282
  • [37] High-throughput metaproteomics data analysis with Unipept: A tutorial
    Mesuere, Bart
    Van der Jeugt, Felix
    Willems, Toon
    Naessens, Tom
    Devreese, Bart
    Martens, Lennart
    Dawyndt, Peter
    JOURNAL OF PROTEOMICS, 2018, 171 : 11 - 22
  • [38] Need for speed in high-throughput sequencing data analysis
    Pluss, M.
    Caspar, S. M.
    Meienberg, J.
    Kopps, A. M.
    Keller, I.
    Bruggmann, R.
    Vogel, M.
    Matyas, G.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 721 - 722
  • [39] Feature cluster selection for high-throughput data analysis
    Yu, Lei
    Li, Hao
    2007 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2007, : 9 - 14
  • [40] Computational Methods for Analysis of High-Throughput Screening Data
    Balakin, Konstantin V.
    Savchuk, Nikolay P.
    CURRENT COMPUTER-AIDED DRUG DESIGN, 2006, 2 (01) : 1 - 19