A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data

被引:10
|
作者
Raman, Pichai [1 ,2 ,8 ]
Zimmerman, Samuel [3 ]
Rathi, Komal S. [2 ,8 ]
de Torrente, Laurence [3 ,11 ]
Sarmady, Mahdi [4 ,9 ]
Wu, Chao [4 ]
Leipzig, Jeremy [4 ,5 ]
Taylor, Deanne M. [2 ,10 ]
Tozeren, Aydin [1 ]
Mar, Jessica C. [3 ,6 ,7 ]
机构
[1] Drexel Univ, Sch Biomed Engn Sci & Hlth Syst, Philadelphia, PA 19104 USA
[2] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA
[3] Albert Einstein Coll Med, Dept Syst & Computat Biol, Bronx, NY 10467 USA
[4] Childrens Hosp Philadelphia, Dept Pathol & Lab Med, Div Genom Diagnost, Philadelphia, PA 19104 USA
[5] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA
[6] Albert Einstein Coll Med, Dept Epidemiol & Populat Hlth, Bronx, NY 10467 USA
[7] Univ Queensland, Australian Inst Bioengn & Nanotechnol, Brisbane, Qld, Australia
[8] Childrens Hosp Philadelphia, Ctr Data Driven Discovery Biomed, Philadelphia, PA 19104 USA
[9] Univ Penn, Perelman Sch Med, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
[10] Univ Penn, Dept Pediat, Perelman Sch Med, Philadelphia, PA 19104 USA
[11] New York Genome Ctr, New York, NY USA
基金
澳大利亚研究理事会;
关键词
Survival analysis; Kaplan-Meier; TCGA; Cancer; Gene expression; PROSTATE-CANCER; TRANSITION; NORMALITY; PROFILES; INDEX;
D O I
10.1016/j.cancergen.2019.04.004
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Identifying genetic biomarkers of patient survival remains a major goal of large-scale cancer profiling studies. Using gene expression data to predict the outcome of a patient's tumor makes biomarker discovery a compelling tool for improving patient care. As genomic technologies expand, multiple data types may serve as informative biomarkers, and bioinformatic strategies have evolved around these different applications. For categorical variables such as a gene's mutation status, biomarker identification to predict survival time is straightforward. However, for continuous variables like gene expression, the available methods generate highly-variable results, and studies on best practices are lacking. We investigated the performance of eight methods that deal specifically with continuous data. K-means, Cox regression, concordance index, D-index, 25th-75th percentile split, median-split, distribution-based splitting, and KaplanScan were applied to four RNA-sequencing (RNA-seq) datasets from the Cancer Genome Atlas. The reliability of the eight methods was assessed by splitting each dataset into two groups and comparing the overlap of the results. Gene sets that had been identified from the literature for a specific tumor type served as positive controls to assess the accuracy of each biomarker using receiver operating characteristic (ROC) curves. Artificial RNA-Seq data were generated to test the robustness of these methods under fixed levels of gene expression noise. Our results show that methods based on dichotomizing tend to have consistently poor performance while C-index, D-index, and k-means perform well in most settings. Overall, the Cox regression method had the strongest performance based on tests of accuracy, reliability, and robustness.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [41] REPAC: analysis of alternative polyadenylation from RNA-sequencing data
    Imada, Eddie L.
    Wilks, Christopher
    Langmead, Ben
    Marchionni, Luigi
    GENOME BIOLOGY, 2023, 24 (01)
  • [42] An Introduction to the Analysis of Single-Cell RNA-Sequencing Data
    AlJanahi, Aisha A.
    Danielsen, Mark
    Dunbar, Cynthia E.
    MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT, 2018, 10 : 189 - 196
  • [43] bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data
    Tang, Wenhao
    Bertaux, Francois
    Thomas, Philipp
    Stefanelli, Claire
    Saint, Malika
    Marguerat, Samuel
    Shahrezaei, Vahid
    BIOINFORMATICS, 2020, 36 (04) : 1174 - 1181
  • [44] Benchmarking of RNA-sequencing analysis workflows using wholetranscriptome RT-qPCR expression data
    Everaert, Celine
    Luypaert, Manuel
    Maag, Jesper L. V.
    Cheng, Quek Xiu
    Dinger, Marcel E.
    Hellemans, Jan
    Mestdagh, Pieter
    SCIENTIFIC REPORTS, 2017, 7
  • [45] Comprehensive RNA-sequencing analysis of colorectal cancer in a Korean cohort
    Lee, Jaeim
    Kim, Jong-Hwan
    Chu, Hoang Bao Khanh
    Oh, Seong-Taek
    Kang, Sung-Bum
    Lee, Sejoon
    Kim, Duck -Woo
    Oh, Heung-Kwon
    Park, Ji-Hwan
    Kim, Jisu
    Kang, Jisun
    Lee, Jin-Young
    Cho, Sheehyun
    Shim, Hyeran
    Lee, Hong Seok
    Kim, Seon-Young
    Kim, Young-Joon
    Yang, Jin Ok
    Lee, Kil-yong
    MOLECULES AND CELLS, 2024, 47 (03)
  • [46] Identification of common and dissimilar biomarkers for different cancer types from gene expressions of RNA-sequencing data
    Venkataramana, Lokeswari
    Jacob, Shomona Gracia
    Saraswathi, S.
    Prasad, D. Venkata Vara
    GENE REPORTS, 2020, 19
  • [47] Determining breast cancer histological grade from RNA-sequencing data
    Wang, Mei
    Klevebring, Daniel
    Lindberg, Johan
    Czene, Kamila
    Gronberg, Henrik
    Rantalainen, Mattias
    BREAST CANCER RESEARCH, 2016, 18
  • [48] Determining breast cancer histological grade from RNA-sequencing data
    Mei Wang
    Daniel Klevebring
    Johan Lindberg
    Kamila Czene
    Henrik Grönberg
    Mattias Rantalainen
    Breast Cancer Research, 18
  • [49] Alternative preprocessing of RNA-Sequencing data in The Cancer Genome Atlas leads to improved analysis results
    Rahman, Mumtahena
    Jackson, Laurie K.
    Johnson, W. Evan
    Li, Dean Y.
    Bild, Andrea H.
    Piccolo, Stephen R.
    BIOINFORMATICS, 2015, 31 (22) : 3666 - 3672
  • [50] From phenotypical investigation to RNA-sequencing for gene expression analysis: A workflow for single and pooled rare cells
    Rossi, Tania
    Angeli, Davide
    Martinelli, Giovanni
    Fabbri, Francesco
    Gallerani, Giulia
    FRONTIERS IN GENETICS, 2022, 13