PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

被引:4
|
作者
Hong, Changjin [1 ,2 ]
Manimaran, Solaiappan [1 ]
Johnson, William [1 ]
机构
[1] Boston Univ, Sch Med, Div Computat Biomed, Boston, MA 02215 USA
[2] Nationwide Childrens Hosp, Cytogenet Mol Genet Lab, Columbus, OH 43205 USA
基金
美国国家卫生研究院;
关键词
sequencing read preprocessing; sequencing quality control; parallel processing; metagenomics;
D O I
10.4137/CIN.S13890
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/.
引用
收藏
页码:167 / 176
页数:10
相关论文
共 50 条
  • [31] Compression of Structured High-Throughput Sequencing Data
    Campagne, Fabien
    Dorff, Kevin C.
    Chambwe, Nyasha
    Robinson, James T.
    Mesirov, Jill P.
    PLOS ONE, 2013, 8 (11):
  • [32] Quality Assessment of High-Throughput DNA Sequencing Data via Range Analysis
    Fotouhi, Ali
    Majidi, Mina
    Kulekci, M. Oguzhan
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2018, PT I, 2018, 10813 : 429 - 438
  • [33] High-throughput film-densitometry: an efficient approach to generate large data sets
    Typke, D
    Nordmeyer, RA
    Jones, A
    Lee, JY
    Avila-Sakar, A
    Downing, KH
    Glaeser, RM
    JOURNAL OF STRUCTURAL BIOLOGY, 2005, 149 (01) : 17 - 29
  • [34] cfDNApipe: a comprehensive quality control and analysis pipeline for cell-free DNA high-throughput sequencing data
    Zhang, Wei
    Wei, Lei
    Huang, Jiaqi
    Zhong, Bixi
    Li, Jiaqi
    Xu, Hanwen
    He, Shuying
    Liu, Yu
    Liu, Juhong
    Lv, Hairong
    Wang, Xiaowo
    BIOINFORMATICS, 2021, 37 (22) : 4251 - 4252
  • [35] Metric Learning for High-Throughput Combinatorial Data Sets
    Vaddi, Kiran
    Wodo, Olga
    ACS COMBINATORIAL SCIENCE, 2019, 21 (11) : 726 - 735
  • [36] Approaches for mining high-throughput screening data sets
    Engels, MFM
    Knapen, K
    Tollenaere, JP
    RATIONAL APPROACHES TO DRUG DESIGN, 2001, : 496 - 505
  • [37] S-leaping: an efficient downsampling method for large high-throughput sequencing data
    Kuwahara, Hiroyuki
    Gao, Xin
    BIOINFORMATICS, 2023, 39 (07)
  • [38] Comparison of high-throughput sequencing data compression tools
    Numanagic, Ibrahim
    Bonfield, James K.
    Hach, Faraz
    Voges, Jan
    Ostermann, Joern
    Alberti, Claudio
    Mattavelli, Marco
    Sahinalp, S. Cenk
    NATURE METHODS, 2016, 13 (12) : 1005 - +
  • [39] Need for speed in high-throughput sequencing data analysis
    Pluss, M.
    Caspar, S. M.
    Meienberg, J.
    Kopps, A. M.
    Keller, I.
    Bruggmann, R.
    Vogel, M.
    Matyas, G.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 721 - 722
  • [40] Genome variation discovery with high-throughput sequencing data
    Dalca, Adrian V.
    Brudno, Michael
    BRIEFINGS IN BIOINFORMATICS, 2010, 11 (01) : 3 - 14