A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data

被引:10
|
作者
Raman, Pichai [1 ,2 ,8 ]
Zimmerman, Samuel [3 ]
Rathi, Komal S. [2 ,8 ]
de Torrente, Laurence [3 ,11 ]
Sarmady, Mahdi [4 ,9 ]
Wu, Chao [4 ]
Leipzig, Jeremy [4 ,5 ]
Taylor, Deanne M. [2 ,10 ]
Tozeren, Aydin [1 ]
Mar, Jessica C. [3 ,6 ,7 ]
机构
[1] Drexel Univ, Sch Biomed Engn Sci & Hlth Syst, Philadelphia, PA 19104 USA
[2] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA
[3] Albert Einstein Coll Med, Dept Syst & Computat Biol, Bronx, NY 10467 USA
[4] Childrens Hosp Philadelphia, Dept Pathol & Lab Med, Div Genom Diagnost, Philadelphia, PA 19104 USA
[5] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA
[6] Albert Einstein Coll Med, Dept Epidemiol & Populat Hlth, Bronx, NY 10467 USA
[7] Univ Queensland, Australian Inst Bioengn & Nanotechnol, Brisbane, Qld, Australia
[8] Childrens Hosp Philadelphia, Ctr Data Driven Discovery Biomed, Philadelphia, PA 19104 USA
[9] Univ Penn, Perelman Sch Med, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
[10] Univ Penn, Dept Pediat, Perelman Sch Med, Philadelphia, PA 19104 USA
[11] New York Genome Ctr, New York, NY USA
基金
澳大利亚研究理事会;
关键词
Survival analysis; Kaplan-Meier; TCGA; Cancer; Gene expression; PROSTATE-CANCER; TRANSITION; NORMALITY; PROFILES; INDEX;
D O I
10.1016/j.cancergen.2019.04.004
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Identifying genetic biomarkers of patient survival remains a major goal of large-scale cancer profiling studies. Using gene expression data to predict the outcome of a patient's tumor makes biomarker discovery a compelling tool for improving patient care. As genomic technologies expand, multiple data types may serve as informative biomarkers, and bioinformatic strategies have evolved around these different applications. For categorical variables such as a gene's mutation status, biomarker identification to predict survival time is straightforward. However, for continuous variables like gene expression, the available methods generate highly-variable results, and studies on best practices are lacking. We investigated the performance of eight methods that deal specifically with continuous data. K-means, Cox regression, concordance index, D-index, 25th-75th percentile split, median-split, distribution-based splitting, and KaplanScan were applied to four RNA-sequencing (RNA-seq) datasets from the Cancer Genome Atlas. The reliability of the eight methods was assessed by splitting each dataset into two groups and comparing the overlap of the results. Gene sets that had been identified from the literature for a specific tumor type served as positive controls to assess the accuracy of each biomarker using receiver operating characteristic (ROC) curves. Artificial RNA-Seq data were generated to test the robustness of these methods under fixed levels of gene expression noise. Our results show that methods based on dichotomizing tend to have consistently poor performance while C-index, D-index, and k-means perform well in most settings. Overall, the Cox regression method had the strongest performance based on tests of accuracy, reliability, and robustness.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [31] Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data
    Alemu Takele Assefa
    Katrijn De Paepe
    Celine Everaert
    Pieter Mestdagh
    Olivier Thas
    Jo Vandesompele
    Genome Biology, 19
  • [32] Comprehensive comparative analysis of 5′-end RNA-sequencing methods
    Xian Adiconis
    Adam L. Haber
    Sean K. Simmons
    Ami Levy Moonshine
    Zhe Ji
    Michele A. Busby
    Xi Shi
    Justin Jacques
    Madeline A. Lancaster
    Jen Q. Pan
    Aviv Regev
    Joshua Z. Levin
    Nature Methods, 2018, 15 : 505 - 511
  • [33] Cancer Type Prediction and Classification Based on RNA-sequencing Data
    Hsu, Yi-Hsin
    Si, Dong
    2018 40TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2018, : 5374 - 5377
  • [34] Clustering and classification methods for single-cell RNA-sequencing data
    Qi, Ren
    Ma, Anjun
    Ma, Qin
    Zou, Quan
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (04) : 1196 - 1208
  • [35] GENE EXPRESSION ALTERATIONS IN HUMAN HEART FAILURE BY RNA-SEQUENCING TECHNOLOGY
    Schiano, Concetta
    Grimaldi, Vincenzo
    Aprile, Marianna
    Esposito, Roberta
    Maiello, Ciro
    Soricelli, Andrea
    Colantuoni, Vittorio
    Costa, Valerio
    Ciccodicola, Alfredo
    Napoli, Claudio
    HLA, 2016, 87 (04) : 274 - 275
  • [36] Differential gene expression by RNA-sequencing in sporadic brain arteriovenous malformations
    Hauer, A. J.
    Kleinloog, R.
    Veldink, J. H.
    Rinkel, G. J. E.
    van der Sprenkel, J. W. Berkelbach
    van der Zwan, A.
    van der Vlies, P.
    Deelen, P.
    Morris, M. A.
    Ruigrok, Y. M.
    Klijn, C. J. M.
    INTERNATIONAL JOURNAL OF STROKE, 2015, 10 : 220 - 221
  • [37] Nonparametric clustering of RNA-sequencing data
    Lozano, Gabriel
    Atallah, Nadia
    Levine, Michael
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (06) : 547 - 559
  • [38] A comparison between ribo-minus RNA-sequencing and polyA-selected RNA-sequencing
    Cui, Peng
    Lin, Qiang
    Ding, Feng
    Xin, Chengqi
    Gong, Wei
    Zhang, Lingfang
    Geng, Jianing
    Zhang, Bing
    Yu, Xiaomin
    Yang, Jin
    Hu, Songnian
    Yu, Jun
    GENOMICS, 2010, 96 (05) : 259 - 265
  • [39] REPAC: analysis of alternative polyadenylation from RNA-sequencing data
    Eddie L. Imada
    Christopher Wilks
    Ben Langmead
    Luigi Marchionni
    Genome Biology, 24
  • [40] A Streamlined Approach to Pathway Analysis from RNA-Sequencing Data
    Bow, Austin
    METHODS AND PROTOCOLS, 2021, 4 (01)