Read Mapping and Transcript Assembly: A Scalable and High-Throughput Workflow for the Processing and Analysis of Ribonucleic Acid Sequencing Data

被引：15

作者：

Peri, Sateesh ^{[1
]}

Roberts, Sarah ^{[2
]}

Kreko, Isabella R. ^{[3
]}

McHan, Lauren B. ^{[3
]}

Naron, Alexandra ^{[3
]}

Ram, Archana ^{[3
]}

Murphy, Rebecca L. ^{[4
]}

Lyons, Eric ^{[1
,2
]}

Gregory, Brian D. ^{[5
]}

Devisetty, Upendra K. ^{[2
]}

Nelson, Andrew D. L. ^{[6
]}

机构：

[1] Univ Arizona, Genet Grad Interdisciplinary Grp, Tucson, AZ USA

[2] Univ Arizona, CyVerse, Tucson, AZ USA

[3] Univ Arizona, Sch Plant Sci, LIVE For Plants Summer Res Program, Tucson, AZ USA

[4] Centenary Coll Louisiana, Biol Dept, Shreveport, LA USA

[5] Univ Penn, Dept Biol, Philadelphia, PA 19104 USA

[6] Cornell Univ, Boyce Thompson Inst, Ithaca, NY 14850 USA

来源：

FRONTIERS IN GENETICS | 2020年 / 10卷

基金：

美国国家科学基金会;

关键词：

RNA-seq; transcriptomics; high throughput (-omics) techniques; bioinformatics; workflow; EXPRESSION ANALYSIS; ARABIDOPSIS; COGE;

D O I：

10.3389/fgene.2019.01361

中图分类号：

Q3 [遗传学];

学科分类号：

071007 ; 090102 ;

摘要：

Next-generation RNA-sequencing is an incredibly powerful means of generating a snapshot of the transcriptomic state within a cell, tissue, or whole organism. As the questions addressed by RNA-sequencing (RNA-seq) become both more complex and greater in number, there is a need to simplify RNA-seq processing workflows, make them more efficient and interoperable, and capable of handling both large and small datasets. This is especially important for researchers who need to process hundreds to tens of thousands of RNA-seq datasets. To address these needs, we have developed a scalable, user-friendly, and easily deployable analysis suite called RMTA (Read Mapping, Transcript Assembly). RMTA can easily process thousands of RNA-seq datasets with features that include automated read quality analysis, filters for lowly expressed transcripts, and read counting for differential expression analysis. RMTA is containerized using Docker for easy deployment within any compute environment [cloud, local, or high-performance computing (HPC)] and is available as two apps in CyVerse's Discovery Environment, one for normal use and one specifically designed for introducing undergraduates and high school to RNA-seq analysis. For extremely large datasets (tens of thousands of FASTq files) we developed a high-throughput, scalable, and parallelized version of RMTA optimized for launching on the Open Science Grid (OSG) from within the Discovery Environment. OSG-RMTA allows users to utilize the Discovery Environment for data management, parallelization, and submitting jobs to OSG, and finally, employ the OSG for distributed, high throughput computing. Alternatively, OSG-RMTA can be run directly on the OSG through the command line. RMTA is designed to be useful for data scientists, of any skill level, interested in rapidly and reproducibly analyzing their large RNA-seq data sets.

引用

页数：9

共 50 条

[21] QIIME allows analysis of high-throughput community sequencing data
J Gregory Caporaso
Justin Kuczynski
Jesse Stombaugh
Kyle Bittinger
Frederic D Bushman
Elizabeth K Costello
Noah Fierer
Antonio Gonzalez Peña
Julia K Goodrich
Jeffrey I Gordon
Gavin A Huttley
Scott T Kelley
Dan Knights
Jeremy E Koenig
Ruth E Ley
Catherine A Lozupone
Daniel McDonald
Brian D Muegge
Meg Pirrung
Jens Reeder
Joel R Sevinsky
Peter J Turnbaugh
William A Walters
Jeremy Widmann
Tanya Yatsunenko
Jesse Zaneveld
Rob Knight
Nature Methods, 2010, 7 : 335 - 336
[22] Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data
Althammer, Sonja
Gonzalez-Vallinas, Juan
Ballare, Cecilia
Beato, Miguel
Eyras, Eduardo
BIOINFORMATICS, 2011, 27 (24) : 3333 - 3340
[23] QIIME allows analysis of high-throughput community sequencing data
Caporaso, J. Gregory
Kuczynski, Justin
Stombaugh, Jesse
Bittinger, Kyle
Bushman, Frederic D.
Costello, Elizabeth K.
Fierer, Noah
Pena, Antonio Gonzalez
Goodrich, Julia K.
Gordon, Jeffrey I.
Huttley, Gavin A.
Kelley, Scott T.
Knights, Dan
Koenig, Jeremy E.
Ley, Ruth E.
Lozupone, Catherine A.
McDonald, Daniel
Muegge, Brian D.
Pirrung, Meg
Reeder, Jens
Sevinsky, Joel R.
Tumbaugh, Peter J.
Walters, William A.
Widmann, Jeremy
Yatsunenko, Tanya
Zaneveld, Jesse
Knight, Rob
NATURE METHODS, 2010, 7 (05) : 335 - 336
[24] Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data
Pan, Yonglong
Wang, Xiaoming
Liu, Lin
Wang, Hao
Luo, Meizhong
PLOS ONE, 2016, 11 (09):
[25] DEVELOPMENT OF A NOVEL SOFTWARE PACKAGE FOR HIGH-THROUGHPUT PROCESSING AND ANALYSIS OF CARDIAC OPTICAL MAPPING DATA
O'Shea, Christopher
Holmes, Andrew
Yu, Ting Yue
Winter, James
Correia, Joao
Kirchhof, Paulus
Fabritz, Larissa
Rajpoot, Kashif
Pavlovic, Davor
HEART, 2017, 103 : A128 - A129
[26] Reproducibility of read numbers in high-throughput sequencing analysis of nematode community composition and structure
Porazinska, Dorota L.
Sung, Way
Giblin-Davis, Robin M.
Thomas, W. Kelley
MOLECULAR ECOLOGY RESOURCES, 2010, 10 (04) : 666 - 676
[27] High-Throughput Analysis of Optical Mapping Data Using ElectroMap
O'Shea, Christopher
Holmes, Andrew P.
Yu, Ting Y.
Winter, James
Wells, Simon P.
Parker, Beth A.
Fobian, Dannie
Johnson, Daniel M.
Correia, Joao
Kirchhoff, Paulus
Fabritz, Larissa
Rajpoot, Kashif
Pavlovic, Davor
JOVE-JOURNAL OF VISUALIZED EXPERIMENTS, 2019, (148):
[28] PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets
Hong, Changjin
Manimaran, Solaiappan
Johnson, William
CANCER INFORMATICS, 2014, 13 : 167 - 176
[29] Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA
Shi, Haixiang
Schmidt, Bertil
Liu, Weiguo
Mueller-Wittig, Wolfgang
2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 1546 - 1553
[30] A Primer on the Analysis of High-Throughput Sequencing Data for Detection of Plant Viruses
Kutnjak, Denis
Tamisier, Lucie
Adams, Ian
Boonham, Neil
Candresse, Thierry
Chiumenti, Michela
De Jonghe, Kris
Kreuze, Jan F.
Lefebvre, Marie
Silva, Goncalo
Malapi-Wight, Martha
Margaria, Paolo
Plesko, Irena Mavriric
McGreig, Sam
Miozzi, Laura
Remenant, Benoit
Reynard, Jean-Sebastien
Rollin, Johan
Rott, Mike
Schumpp, Olivier
Massart, Sebastien
Haegeman, Annelies
MICROORGANISMS, 2021, 9 (04)

← 1 2 3 4 5 →