A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data

被引:10
|
作者
Raman, Pichai [1 ,2 ,8 ]
Zimmerman, Samuel [3 ]
Rathi, Komal S. [2 ,8 ]
de Torrente, Laurence [3 ,11 ]
Sarmady, Mahdi [4 ,9 ]
Wu, Chao [4 ]
Leipzig, Jeremy [4 ,5 ]
Taylor, Deanne M. [2 ,10 ]
Tozeren, Aydin [1 ]
Mar, Jessica C. [3 ,6 ,7 ]
机构
[1] Drexel Univ, Sch Biomed Engn Sci & Hlth Syst, Philadelphia, PA 19104 USA
[2] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA
[3] Albert Einstein Coll Med, Dept Syst & Computat Biol, Bronx, NY 10467 USA
[4] Childrens Hosp Philadelphia, Dept Pathol & Lab Med, Div Genom Diagnost, Philadelphia, PA 19104 USA
[5] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA
[6] Albert Einstein Coll Med, Dept Epidemiol & Populat Hlth, Bronx, NY 10467 USA
[7] Univ Queensland, Australian Inst Bioengn & Nanotechnol, Brisbane, Qld, Australia
[8] Childrens Hosp Philadelphia, Ctr Data Driven Discovery Biomed, Philadelphia, PA 19104 USA
[9] Univ Penn, Perelman Sch Med, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
[10] Univ Penn, Dept Pediat, Perelman Sch Med, Philadelphia, PA 19104 USA
[11] New York Genome Ctr, New York, NY USA
基金
澳大利亚研究理事会;
关键词
Survival analysis; Kaplan-Meier; TCGA; Cancer; Gene expression; PROSTATE-CANCER; TRANSITION; NORMALITY; PROFILES; INDEX;
D O I
10.1016/j.cancergen.2019.04.004
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Identifying genetic biomarkers of patient survival remains a major goal of large-scale cancer profiling studies. Using gene expression data to predict the outcome of a patient's tumor makes biomarker discovery a compelling tool for improving patient care. As genomic technologies expand, multiple data types may serve as informative biomarkers, and bioinformatic strategies have evolved around these different applications. For categorical variables such as a gene's mutation status, biomarker identification to predict survival time is straightforward. However, for continuous variables like gene expression, the available methods generate highly-variable results, and studies on best practices are lacking. We investigated the performance of eight methods that deal specifically with continuous data. K-means, Cox regression, concordance index, D-index, 25th-75th percentile split, median-split, distribution-based splitting, and KaplanScan were applied to four RNA-sequencing (RNA-seq) datasets from the Cancer Genome Atlas. The reliability of the eight methods was assessed by splitting each dataset into two groups and comparing the overlap of the results. Gene sets that had been identified from the literature for a specific tumor type served as positive controls to assess the accuracy of each biomarker using receiver operating characteristic (ROC) curves. Artificial RNA-Seq data were generated to test the robustness of these methods under fixed levels of gene expression noise. Our results show that methods based on dichotomizing tend to have consistently poor performance while C-index, D-index, and k-means perform well in most settings. Overall, the Cox regression method had the strongest performance based on tests of accuracy, reliability, and robustness.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [21] DFseq: Distribution-Free Method to Detect Differential Gene Expression for RNA-Sequencing Data
    Yang, Shengping
    Wachtel, Mitchell S.
    Wu, Jiangrong
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (02) : 558 - 565
  • [22] Demultiplexing of single-cell RNA-sequencing data using interindividual variation in gene expression
    Nassiri, Isar
    Kwok, Andrew J.
    Bhandari, Aneesha
    Bull, Katherine R.
    Garner, Lucy C.
    Klenerman, Paul
    Webber, Caleb
    Parkkinen, Laura
    Lee, Angela W.
    Wu, Yanxia
    Fairfax, Benjamin
    Knight, Julian C.
    Buck, David
    Piazza, Paolo
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [23] Isoform-level gene expression patterns in single-cell RNA-sequencing data
    Trung Nghia Vu
    Wills, Quin F.
    Kalari, Krishna R.
    Niu, Nifang
    Wang, Liewei
    Pawitan, Yudi
    Rantalainen, Mattias
    BIOINFORMATICS, 2018, 34 (14) : 2392 - 2400
  • [24] Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data
    Jong Kyoung Kim
    John C Marioni
    Genome Biology, 14
  • [25] Expression variation analysis for tumor heterogeneity in single-cell RNA-sequencing data
    Davis-Marcisak, Emily F.
    Orugunta, Pranay
    Stein-O'Brien, Genevieve
    Puram, Sidharth V.
    Torres, Evanthia Roussos
    Hopkins, Alexander
    Jaffee, Elizabeth M.
    Favorov, Alexander V.
    Afsari, Bahman
    Goff, Loyal A.
    Fertig, Elana J.
    CANCER RESEARCH, 2019, 79 (13)
  • [26] Analysis of lncrna expression in patients with metabolic syndrome: an investigation based on RNA-sequencing data
    Oyaci, Yasemin
    Senkal, Naci
    Medetalibeyoglu, Alpay
    Tuncel, Fatima Ceren
    Kose, Murat
    Tukek, Tufan
    Pehlivan, Sacide
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1407 - 1407
  • [27] Cancer Survival Analysis using RNA Sequencing and Clinical Data
    Clayman, Carly L.
    Srinivasan, Satish M.
    Sangwan, Raghvinder S.
    COMPLEX ADAPTIVE SYSTEMS, 2020, 168 : 80 - 87
  • [28] Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data
    Assefa, Alemu Takele
    De Paepe, Katrijn
    Everaert, Celine
    Mestdagh, Pieter
    Thas, Olivier
    Vandesompele, Jo
    GENOME BIOLOGY, 2018, 19
  • [29] Comprehensive comparative analysis of 5'-end RNA-sequencing methods
    Adiconis, Xian
    Haber, Adam L.
    Simmons, Sean K.
    Moonshine, Ami Levy
    Ji, Zhe
    Busby, Michele A.
    Shi, Xi
    Jacques, Justin
    Lancaster, Madeline A.
    Pan, Jen Q.
    Regev, Aviv
    Levin, Joshua Z.
    NATURE METHODS, 2018, 15 (07) : 505 - +
  • [30] RNA-sequencing profiles hippocampal gene expression in a validated model of cancer-induced depression
    Nashed, M. G.
    Linher-Melville, K.
    Frey, B. N.
    Singh, G.
    GENES BRAIN AND BEHAVIOR, 2016, 15 (08) : 711 - 721