Finding the active genes in deep RNA-seq gene expression studies

被引:175
|
作者
Hart, Traver [1 ]
Komori, H. Kiyomi [2 ]
LaMere, Sarah [2 ]
Podshivalova, Katie [2 ]
Salomon, Daniel R. [2 ]
机构
[1] Univ Toronto, Banting & Best Dept Med Res, Donnelly Ctr, Toronto, ON, Canada
[2] Scripps Res Inst, Dept Mol & Expt Med, La Jolla, CA 92037 USA
来源
BMC GENOMICS | 2013年 / 14卷
基金
美国国家卫生研究院;
关键词
QUANTIFICATION; TRANSCRIPTOME;
D O I
10.1186/1471-2164-14-778
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Early application of second-generation sequencing technologies to transcript quantitation (RNA-seq) has hinted at a vast mammalian transcriptome, including transcripts from nearly all known genes, which might be fully measured only by ultradeep sequencing. Subsequent studies suggested that low-abundance transcripts might be the result of technical or biological noise rather than active transcripts; moreover, most RNA-seq experiments did not provide enough read depth to generate high-confidence estimates of gene expression for low-abundance transcripts. As a result, the community adopted several heuristics for RNA-seq analysis, most notably an arbitrary expression threshold of 0.3 - 1 FPKM for downstream analysis. However, advances in RNA-seq library preparation, sequencing technology, and informatic analysis have addressed many of the systemic sources of uncertainty and undermined the assumptions that drove the adoption of these heuristics. We provide an updated view of the accuracy and efficiency of RNA-seq experiments, using genomic data from large-scale studies like the ENCODE project to provide orthogonal information against which to validate our conclusions. Results: We show that a human cell's transcriptome can be divided into active genes carrying out the work of the cell and other genes that are likely the by-products of biological or experimental noise. We use ENCODE data on chromatin state to show that ultralow-expression genes are predominantly associated with repressed chromatin; we provide a novel normalization metric, zFPKM, that identifies the threshold between active and background gene expression; and we show that this threshold is robust to experimental and analytical variations. Conclusions: The zFPKM normalization method accurately separates the biologically relevant genes in a cell, which are associated with active promoters, from the ultralow-expression noisy genes that have repressed promoters. A read depth of twenty to thirty million mapped reads allows high-confidence quantitation of genes expressed at this threshold, providing important guidance for the design of RNA-seq studies of gene expression. Moreover, we offer an example for using extensive ENCODE chromatin state information to validate RNA-seq analysis pipelines.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] RNA-seq reveals more consistent reference genes for gene expression studies in human non-melanoma skin cancers
    Hoang, Van L. T.
    Tom, Lisa N.
    Quek, Xiu-Cheng
    Tan, Jean-Marie
    Payne, Elizabeth J.
    Lin, Lynlee L.
    Sinnya, Sudipta
    Raphael, Anthony P.
    Lambie, Duncan
    Frazer, Ian H.
    Dinger, Marcel E.
    Soyer, H. Peter
    Prow, Tarl W.
    PEERJ, 2017, 5
  • [32] Transcriptomics - Digging deep with RNA-Seq
    Flintoft, Louisa
    NATURE REVIEWS GENETICS, 2008, 9 (08) : 568 - 568
  • [33] Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments
    Pasaniuc, Bogdan
    Zaitlen, Noah
    Halperin, Eran
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (03) : 459 - 468
  • [34] Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments
    Pasaniuc, Bogdan
    Zaitlen, Noah
    Halperin, Eran
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2010, 6044 : 397 - +
  • [35] Effect of Low-Expression Gene Filtering on Detection of Differentially Expressed Genes in RNA-Seq Data
    Sha, Ying
    Phan, John H.
    Wang, May D.
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 6461 - 6464
  • [36] A whole-tissue RNA-seq toolkit for organism-wide studies of gene expression with PME-seq
    Pandey, Surya
    Takahama, Michihiro
    Gruenbaum, Adam
    Zewde, Makda
    Cheronis, Katerina
    Chevrier, Nicolas
    NATURE PROTOCOLS, 2020, 15 (04) : 1459 - 1483
  • [37] A whole-tissue RNA-seq toolkit for organism-wide studies of gene expression with PME-seq
    Surya Pandey
    Michihiro Takahama
    Adam Gruenbaum
    Makda Zewde
    Katerina Cheronis
    Nicolas Chevrier
    Nature Protocols, 2020, 15 : 1459 - 1483
  • [38] RNA-Seq and find: entering the RNA deep field
    Adam Roberts
    Lior Pachter
    Genome Medicine, 3
  • [39] RNA-Seq and find: entering the RNA deep field
    Roberts, Adam
    Pachter, Lior
    GENOME MEDICINE, 2011, 3
  • [40] Deep annotation of long noncoding RNAs by assembling RNA-seq and small RNA-seq data
    Zhang, Jiaming
    Hou, Weibo
    Zhao, Qi
    Xiao, Songling
    Linghu, Hongye
    Zhang, Lixin
    Du, Jiawei
    Cui, Hongdi
    Yang, Xu
    Ling, Shukuan
    Su, Jianzhong
    Kong, Qingran
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2023, 299 (09)