An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era

被引:114
|
作者
Su, Zhenqiang [1 ,2 ]
Fang, Hong [1 ]
Hong, Huixiao [1 ]
Shi, Leming [3 ,4 ,5 ,6 ]
Zhang, Wenqian [1 ]
Zhang, Wenwei [6 ,7 ]
Zhang, Yanyan [7 ]
Dong, Zirui [7 ,8 ]
Lancashire, Lee J. [3 ]
Bessarabova, Marina [2 ]
Yang, Xi [1 ]
Ning, Baitang [1 ]
Gong, Binsheng [1 ]
Meehan, Joe [1 ]
Xu, Joshua [1 ]
Ge, Weigong [1 ]
Perkins, Roger [1 ]
Fischer, Matthias [8 ,9 ]
Tong, Weida [1 ]
机构
[1] US FDA, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
[2] Thomson Reuters, IP & Sci, Boston, MA 02210 USA
[3] Fudan Univ, Sch Life Sci & Pharm, State Key Lab Genet Engn, Shanghai 201203, Peoples R China
[4] Fudan Univ, Sch Life Sci & Pharm, MOE Key Lab Contemporary Anthropol, Shanghai 201203, Peoples R China
[5] Fudan Zhangjiang Ctr Clin Genom, Shanghai 201203, Peoples R China
[6] Zhanjiang Ctr Translat Med, Shanghai 201203, Peoples R China
[7] BGI Shenzhen, Guangdong 518083, Peoples R China
[8] Univ Childrens Hosp Cologne, Dept Pediat Oncol & Hematol, D-50924 Cologne, Germany
[9] Univ Childrens Hosp Cologne, Ctr Mol Med CMMC, D-50924 Cologne, Germany
来源
GENOME BIOLOGY | 2014年 / 15卷 / 12期
基金
美国国家科学基金会; 国家高技术研究发展计划(863计划);
关键词
GENE-EXPRESSION SIGNATURE; REPRODUCIBILITY;
D O I
10.1186/s13059-014-0523-y
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Gene expression microarray has been the primary biomarker platform ubiquitously applied in biomedical research, resulting in enormous data, predictive models, and biomarkers accrued. Recently, RNA-seq has looked likely to replace microarrays, but there will be a period where both technologies co-exist. This raises two important questions: Can microarray-based models and biomarkers be directly applied to RNA-seq data? Can future RNA-seq-based predictive models and biomarkers be applied to microarray data to leverage past investment? Results: We systematically evaluated the transferability of predictive models and signature genes between microarray and RNA-seq using two large clinical data sets. The complexity of cross-platform sequence correspondence was considered in the analysis and examined using three human and two rat data sets, and three levels of mapping complexity were revealed. Three algorithms representing different modeling complexity were applied to the three levels of mappings for each of the eight binary endpoints and Cox regression was used to model survival times with expression data. In total, 240,096 predictive models were examined. Conclusions: Signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development, and microarray-based models can accurately predict RNA-seq-profiled samples; while RNA-seq-based models are less accurate in predicting microarray-profiled samples and are affected both by the choice of modeling algorithm and the gene mapping complexity. The results suggest continued usefulness of legacy microarray data and established microarray biomarkers and predictive models in the forthcoming RNA-seq era.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era
    Zhenqiang Su
    Hong Fang
    Huixiao Hong
    Leming Shi
    Wenqian Zhang
    Wenwei Zhang
    Yanyan Zhang
    Zirui Dong
    Lee J Lancashire
    Marina Bessarabova
    Xi Yang
    Baitang Ning
    Binsheng Gong
    Joe Meehan
    Joshua Xu
    Weigong Ge
    Roger Perkins
    Matthias Fischer
    Weida Tong
    Genome Biology, 15
  • [2] Transcriptomics in the RNA-seq era
    McGettigan, Paul A.
    CURRENT OPINION IN CHEMICAL BIOLOGY, 2013, 17 (01) : 4 - 11
  • [3] The Role of the Microbiome in Colorectal Cancer derived from RNA-seq Data
    Hamed, Babaee
    RESEARCH JOURNAL OF BIOTECHNOLOGY, 2022, 17 (06): : 21 - 44
  • [4] SAMPLE SIZE DETERMINATION FOR TRAINING CANCER CLASSIFIERS FROM MICROARRAY AND RNA-seq DATA
    Safo, Sandra
    Song, Xiao
    Dobbin, Kevin K.
    ANNALS OF APPLIED STATISTICS, 2015, 9 (02): : 1053 - 1075
  • [5] Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling
    Daniel Castillo
    Juan Manuel Gálvez
    Luis Javier Herrera
    Belén San Román
    Fernando Rojas
    Ignacio Rojas
    BMC Bioinformatics, 18
  • [6] Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling
    Castillo, Daniel
    Manuel Galvez, Juan
    Javier Herrera, Luis
    San Roman, Belen
    Rojas, Fernando
    Rojas, Ignacio
    BMC BIOINFORMATICS, 2017, 18
  • [7] voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data
    Zararsiz, Gokmen
    Goksuluk, Dincer
    Klaus, Bernd
    Korkmaz, Selcuk
    Eldem, Vahap
    Karabulut, Erdem
    Ozturk, Ahmet
    PEERJ, 2017, 5
  • [8] GSVA: gene set variation analysis for microarray and RNA-Seq data
    Haenzelmann, Sonja
    Castelo, Robert
    Guinney, Justin
    BMC BIOINFORMATICS, 2013, 14
  • [9] GSVA: gene set variation analysis for microarray and RNA-Seq data
    Sonja Hänzelmann
    Robert Castelo
    Justin Guinney
    BMC Bioinformatics, 14
  • [10] FusionCancer: a database of cancer fusion genes derived from RNA-seq data
    Yunjin Wang
    Nan Wu
    Jiaqi Liu
    Zhihong Wu
    Dong Dong
    Diagnostic Pathology, 10