PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

被引:4
|
作者
Hong, Changjin [1 ,2 ]
Manimaran, Solaiappan [1 ]
Johnson, William [1 ]
机构
[1] Boston Univ, Sch Med, Div Computat Biomed, Boston, MA 02215 USA
[2] Nationwide Childrens Hosp, Cytogenet Mol Genet Lab, Columbus, OH 43205 USA
基金
美国国家卫生研究院;
关键词
sequencing read preprocessing; sequencing quality control; parallel processing; metagenomics;
D O I
10.4137/CIN.S13890
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/.
引用
收藏
页码:167 / 176
页数:10
相关论文
共 50 条
  • [21] Efficient and quantitative high-throughput tRNA sequencing
    Zheng, Guanqun
    Qin, Yidan
    Clark, Wesley C.
    Dai, Qing
    Yi, Chengqi
    He, Chuan
    Lambowitz, Alan M.
    Pan, Tao
    NATURE METHODS, 2015, 12 (09) : 835 - +
  • [22] Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive
    Ohta, Tazro
    Nakazato, Takeru
    Bono, Hidemasa
    GIGASCIENCE, 2017, 6 (06):
  • [23] htSeqTools: high-throughput sequencing quality control, processing and visualization in R
    Planet, Evarist
    Stephan-Otto Attolini, Camille
    Reina, Oscar
    Flores, Oscar
    Rossell, David
    BIOINFORMATICS, 2012, 28 (04) : 589 - 590
  • [24] High-Throughput Sequencing as a Tool for the Quality Control of Microbial Bioformulations for Agriculture
    Syromyatnikov, Mikhail Y.
    Nesterova, Ekaterina Y.
    Gladkikh, Maria, I
    Tolkacheva, Anna A.
    Bondareva, Olga, V
    Syrov, Vladimir M.
    Pryakhina, Nina A.
    Popov, Vasily N.
    PROCESSES, 2022, 10 (11)
  • [25] Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA
    Shi, Haixiang
    Schmidt, Bertil
    Liu, Weiguo
    Mueller-Wittig, Wolfgang
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 1546 - 1553
  • [26] Tools for mapping high-throughput sequencing data
    Fonseca, Nuno A.
    Rung, Johan
    Brazma, Alvis
    Marioni, John C.
    BIOINFORMATICS, 2012, 28 (24) : 3169 - 3177
  • [27] Genome reassembly with high-throughput sequencing data
    Parrish, Nathaniel
    Sudakov, Benjamin
    Eskin, Eleazar
    BMC GENOMICS, 2013, 14
  • [28] Genome reassembly with high-throughput sequencing data
    Nathaniel Parrish
    Benjamin Sudakov
    Eleazar Eskin
    BMC Genomics, 14
  • [29] HTSQualC is a flexible and one-step quality control software for high-throughput sequencing data analysis
    Renesh Bedre
    Carlos Avila
    Kranthi Mandadi
    Scientific Reports, 11
  • [30] HTSQualC is a flexible and one-step quality control software for high-throughput sequencing data analysis
    Bedre, Renesh
    Avila, Carlos
    Mandadi, Kranthi
    SCIENTIFIC REPORTS, 2021, 11 (01)