Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data

被引：285

作者：

Kupershmidt, Ilya ^{[1
,2
]}

Su, Qiaojuan Jane ^{[1
]}

Grewal, Anoop ^{[1
]}

Sundaresh, Suman ^{[1
]}

Halperin, Inbal ^{[1
]}

Flynn, James ^{[1
]}

Shekar, Mamatha ^{[1
]}

Wang, Helen ^{[1
]}

Park, Jenny ^{[1
]}

Cui, Wenwu ^{[1
]}

Wall, Gregory D. ^{[1
]}

Wisotzkey, Robert ^{[1
]}

Alag, Satnam ^{[1
]}

Akhtari, Saeid ^{[1
]}

Ronaghi, Mostafa ^{[1
,3
]}

机构：

[1] NextBio, Cupertino, CA USA

[2] Royal Inst Technol KTH, Stockholm, Sweden

[3] Illumina, San Diego, CA USA

来源：

PLOS ONE | 2010年 / 5卷 / 09期

关键词：

GENE-EXPRESSION PROFILES; ADIPOCYTE DIFFERENTIATION; SKELETAL-MUSCLE; CANCER; BROWN; MICROARRAYS; SIGNATURES; IDENTIFICATION; PROBESETS; ARRAY;

D O I：

10.1371/journal.pone.0013066

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Background: The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today. Methodology/Results: We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. Conclusions: Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.

引用

页数：13

共 50 条

[41] Sparse Canonical Covariance Analysis for High-throughput Data
Lee, Woojoo
Lee, Donghwan
Lee, Youngjo
Pawitan, Yudi
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
[42] Statistical practice in high-throughput screening data analysis
Malo, N
Hanley, JA
Cerquozzi, S
Pelletier, J
Nadon, R
NATURE BIOTECHNOLOGY, 2006, 24 (02) : 167 - 175
[43] Feature cluster selection for high-throughput data analysis
Yu, Lei
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2009, 3 (02) : 177 - 191
[44] Statistical practice in high-throughput screening data analysis
Nathalie Malo
James A Hanley
Sonia Cerquozzi
Jerry Pelletier
Robert Nadon
Nature Biotechnology, 2006, 24 : 167 - 175
[45] WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data
Ming Yi
Jay D Horton
Jonathan C Cohen
Helen H Hobbs
Robert M Stephens
BMC Bioinformatics, 7
[46] WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data
Yi, M
Horton, JD
Cohen, JC
Hobbs, HH
Stephens, RM
BMC BIOINFORMATICS, 2006, 7 (1)
[47] Shape analysis of high-throughput transcriptomics experiment data
Okrah, Kwame
Bravo, Hector Corrada
BIOSTATISTICS, 2015, 16 (04) : 627 - 640
[48] High-throughput single cell data analysis - A tutorial
Tinnevelt, Gerjen H.
Wouters, Kristiaan
Postma, Geert J.
Folcarelli, Rita
Jansen, Jeroen J.
ANALYTICA CHIMICA ACTA, 2021, 1185
[49] Computational analysis of high-throughput flow cytometry data
Robinson, J. Paul
Rajwa, Bartek
Patsekin, Valery
Davisson, Vincent Jo
EXPERT OPINION ON DRUG DISCOVERY, 2012, 7 (08) : 679 - 693
[50] Protein Complex-Based Analysis Framework for High-Throughput Data Sets
Vinayagam, Arunachalam
Hu, Yanhui
Kulkarni, Meghana
Roesel, Charles
Sopko, Richelle
Mohr, Stephanie E.
Perrimon, Norbert
SCIENCE SIGNALING, 2013, 6 (264)

← 1 2 3 4 5 →