Read Mapping and Transcript Assembly: A Scalable and High-Throughput Workflow for the Processing and Analysis of Ribonucleic Acid Sequencing Data

被引:15
|
作者
Peri, Sateesh [1 ]
Roberts, Sarah [2 ]
Kreko, Isabella R. [3 ]
McHan, Lauren B. [3 ]
Naron, Alexandra [3 ]
Ram, Archana [3 ]
Murphy, Rebecca L. [4 ]
Lyons, Eric [1 ,2 ]
Gregory, Brian D. [5 ]
Devisetty, Upendra K. [2 ]
Nelson, Andrew D. L. [6 ]
机构
[1] Univ Arizona, Genet Grad Interdisciplinary Grp, Tucson, AZ USA
[2] Univ Arizona, CyVerse, Tucson, AZ USA
[3] Univ Arizona, Sch Plant Sci, LIVE For Plants Summer Res Program, Tucson, AZ USA
[4] Centenary Coll Louisiana, Biol Dept, Shreveport, LA USA
[5] Univ Penn, Dept Biol, Philadelphia, PA 19104 USA
[6] Cornell Univ, Boyce Thompson Inst, Ithaca, NY 14850 USA
基金
美国国家科学基金会;
关键词
RNA-seq; transcriptomics; high throughput (-omics) techniques; bioinformatics; workflow; EXPRESSION ANALYSIS; ARABIDOPSIS; COGE;
D O I
10.3389/fgene.2019.01361
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Next-generation RNA-sequencing is an incredibly powerful means of generating a snapshot of the transcriptomic state within a cell, tissue, or whole organism. As the questions addressed by RNA-sequencing (RNA-seq) become both more complex and greater in number, there is a need to simplify RNA-seq processing workflows, make them more efficient and interoperable, and capable of handling both large and small datasets. This is especially important for researchers who need to process hundreds to tens of thousands of RNA-seq datasets. To address these needs, we have developed a scalable, user-friendly, and easily deployable analysis suite called RMTA (Read Mapping, Transcript Assembly). RMTA can easily process thousands of RNA-seq datasets with features that include automated read quality analysis, filters for lowly expressed transcripts, and read counting for differential expression analysis. RMTA is containerized using Docker for easy deployment within any compute environment [cloud, local, or high-performance computing (HPC)] and is available as two apps in CyVerse's Discovery Environment, one for normal use and one specifically designed for introducing undergraduates and high school to RNA-seq analysis. For extremely large datasets (tens of thousands of FASTq files) we developed a high-throughput, scalable, and parallelized version of RMTA optimized for launching on the Open Science Grid (OSG) from within the Discovery Environment. OSG-RMTA allows users to utilize the Discovery Environment for data management, parallelization, and submitting jobs to OSG, and finally, employ the OSG for distributed, high throughput computing. Alternatively, OSG-RMTA can be run directly on the OSG through the command line. RMTA is designed to be useful for data scientists, of any skill level, interested in rapidly and reproducibly analyzing their large RNA-seq data sets.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data
    Fimereli, Danai
    Detours, Vincent
    Konopka, Tomasz
    NUCLEIC ACIDS RESEARCH, 2013, 41 (07)
  • [32] An efficient population genetic analysis method for high-throughput sequencing data
    Li, Jie
    Qian, Jiating
    Ding, Xi
    Ling, Yayue
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 48 - 49
  • [33] Compositional Uncertainty Should Not Be Ignored in High-Throughput Sequencing Data Analysis
    Gloor, Gregory B.
    Macklaim, Jean M.
    Vu, Michael
    Fernandes, Andrew D.
    AUSTRIAN JOURNAL OF STATISTICS, 2016, 45 (04) : 73 - 87
  • [34] Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
    Ségolène Caboche
    Christophe Audebert
    Yves Lemoine
    David Hot
    BMC Genomics, 15
  • [35] Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
    Caboche, Segolene
    Audebert, Christophe
    Lemoine, Yves
    Hot, David
    BMC GENOMICS, 2014, 15
  • [36] Comparison of high-throughput single-cell RNA sequencing data processing pipelines
    Gao, Mingxuan
    Ling, Mingyi
    Tang, Xinwei
    Wang, Shun
    Xiao, Xu
    Qiao, Ying
    Yang, Wenxian
    Yu, Rongshan
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [37] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
  • [38] Accelerating de Bruijn Graph-based Genome Assembly for High-Throughput Short Read Data
    Zhao, Kun
    Liu, Weiguo
    Voss, Gerrit
    Mueller-Wittig, Wolfgang
    2013 19TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2013), 2013, : 426 - 427
  • [39] High-throughput sequencing-based Detection of Japanese encephalitis virus and its effect on micro ribonucleic acid
    Liu, Qinghua
    Deng, Weisheng
    Guo, Xuemin
    Li, Kangsheng
    MICROBIAL PATHOGENESIS, 2023, 182
  • [40] CasCollect: targeted assembly of CRISPR-associated operons from high-throughput sequencing data
    Podlevsky, Joshua D.
    Hudson, Corey M.
    Timlin, Jerilyn A.
    Williams, Kelly P.
    NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (03)