PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

被引:4
|
作者
Hong, Changjin [1 ,2 ]
Manimaran, Solaiappan [1 ]
Johnson, William [1 ]
机构
[1] Boston Univ, Sch Med, Div Computat Biomed, Boston, MA 02215 USA
[2] Nationwide Childrens Hosp, Cytogenet Mol Genet Lab, Columbus, OH 43205 USA
基金
美国国家卫生研究院;
关键词
sequencing read preprocessing; sequencing quality control; parallel processing; metagenomics;
D O I
10.4137/CIN.S13890
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/.
引用
收藏
页码:167 / 176
页数:10
相关论文
共 50 条
  • [1] Quality assessment and control of high-throughput sequencing data
    Watson, Mick
    FRONTIERS IN GENETICS, 2014, 5
  • [2] AlmostSignificant: simplifying quality control of high-throughput sequencing data
    Ward, Joseph
    Cole, Christian
    Febrer, Melanie
    Barton, Geoffrey J.
    BIOINFORMATICS, 2016, 32 (24) : 3850 - 3851
  • [3] SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data
    Chen, Yuxin
    Chen, Yongsheng
    Shi, Chunmei
    Huang, Zhibo
    Zhang, Yong
    Li, Shengkang
    Li, Yan
    Ye, Jia
    Yu, Chang
    Li, Zhuo
    Zhang, Xiuqing
    Wang, Jian
    Yang, Huanming
    Fang, Lin
    Chen, Qiang
    GIGASCIENCE, 2017, 7 (01): : 1 - 6
  • [4] Optimizing a MapReduce Module of Preprocessing High-Throughput DNA Sequencing Data
    Chung, Wei-Chun
    Chang, Yu-Jung
    Chen, Chien-Chih
    Lee, Der-Tsai
    Ho, Jan-Ming
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [5] Rqc: A Bioconductor Package for Quality Control of High-Throughput Sequencing Data
    de Souza, Welliton
    Carvalho, Benilton de Sa
    Lopes-Cendes, Iscia
    JOURNAL OF STATISTICAL SOFTWARE, 2018, 87 (CN2): : 1 - 14
  • [6] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
  • [7] Quantifying selection in high-throughput Immunoglobulin sequencing data sets
    Yaari, Gur
    Uduman, Mohamed
    Kleinstein, Steven H.
    NUCLEIC ACIDS RESEARCH, 2012, 40 (17)
  • [8] High-Throughput Identification of Adapters in Single-Read Sequencing Data
    Mohideen, Asan M. S. H.
    Johansen, Steinar D.
    Babiak, Igor
    BIOMOLECULES, 2020, 10 (06) : 1 - 12
  • [9] Efficient digest of high-throughput sequencing data in a reproducible report
    Zhang, Zhe
    Leipzig, Jeremy
    Sasson, Ariella
    Yu, Angela M.
    Perin, Juan C.
    Xie, Hongbo M.
    Sarmady, Mahdi
    Warren, Patrick V.
    White, Peter S.
    BMC BIOINFORMATICS, 2013, 14
  • [10] Efficient digest of high-throughput sequencing data in a reproducible report
    Zhe Zhang
    Jeremy Leipzig
    Ariella Sasson
    Angela M Yu
    Juan C Perin
    Hongbo M Xie
    Mahdi Sarmady
    Patrick V Warren
    Peter S White
    BMC Bioinformatics, 14