Automated sequence preprocessing in a large-scale sequencing environment

被引:29
|
作者
Wendl, MC [1 ]
Dear, S
Hodgson, D
Hillier, L
机构
[1] Washington Univ, Genome Sequencing Ctr, St Louis, MO 63108 USA
[2] Sanger Ctr, Cambridge CB10 1SA, England
来源
GENOME RESEARCH | 1998年 / 8卷 / 09期
基金
英国惠康基金;
关键词
D O I
10.1101/gr.8.9.975
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A software system for transforming fragments from four-color Fluorescence-based gel electrophoresis experiments into assembled sequence is described. It has been developed for large-scale processing of all trace data, including shotgun and finishing reads, regardless of clone origin. Design considerations are discussed in detail, as are programming implementation and graphic tools. The importance of input validation, record tracking, and use of base quality values is emphasized. Several quality analysis metrics are proposed and applied to sample results from recently sequenced clones. Such quantities prove to be a valuable aid in evaluating modifications of sequencing protocol. The system is in full production use at both the Genome Sequencing Center and the Sanger Centre, for which combined weekly production is similar to 100,000 sequencing reads per week.
引用
收藏
页码:975 / 984
页数:10
相关论文
共 50 条
  • [41] Automated Large-scale Class Scheduling in MiniZinc
    Rahman, Md Mushfiqur
    Noor, Sabah Binte
    Siddiqui, Fazlul Hasan
    2020 2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE TECHNOLOGIES FOR INDUSTRY 4.0 (STI), 2020,
  • [42] LAVA: Large-scale Automated Vulnerability Addition
    Dolan-Gavitt, Brendan
    Hulin, Patrick
    Kirda, Engin
    Leek, Tim
    Mambretti, Andrea
    Robertson, Wil
    Ulrich, Frederick
    Whelan, Ryan
    2016 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2016, : 110 - 121
  • [43] Large-Scale Automated Histology in the Pursuit of Connectomes
    Kleinfeld, David
    Bharioke, Arjun
    Blinder, Pablo
    Bock, Davi D.
    Briggman, Kevin L.
    Chklovskii, Dmitri B.
    Denk, Winfried
    Helmstaedter, Moritz
    Kaufhold, John P.
    Lee, Wei-Chung Allen
    Meyer, Hanno S.
    Micheva, Kristina D.
    Oberlaender, Marcel
    Prohaska, Steffen
    Reid, R. Clay
    Smith, Stephen J.
    Takemura, Shinya
    Tsai, Philbert S.
    Sakmann, Bert
    JOURNAL OF NEUROSCIENCE, 2011, 31 (45): : 16125 - 16138
  • [44] Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing
    Yan, J
    Zhang, BY
    Liu, N
    Yan, SC
    Cheng, QS
    Fan, WG
    Yang, Q
    Xi, WS
    Chen, Z
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (03) : 320 - 333
  • [45] Accelerated large-scale multiple sequence alignment
    Scott Lloyd
    Quinn O Snell
    BMC Bioinformatics, 12
  • [46] Large-Scale Human Genome Sequencing for Advanced Disease Studies: Dissecting the Interplay of Genetics and Environment
    Drmanac, R.
    ENVIRONMENTAL AND MOLECULAR MUTAGENESIS, 2010, 51 (07) : 691 - 691
  • [47] Accelerated large-scale multiple sequence alignment
    Lloyd, Scott
    Snell, Quinn O.
    BMC BIOINFORMATICS, 2011, 12
  • [48] Comparing algorithms for large-scale sequence analysis
    Nash, H
    Blair, D
    Grefenstette, J
    2ND ANNUAL IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, PROCEEDINGS, 2001, : 89 - 96
  • [49] Effective large-scale sequence similarity searches
    Claverie, JM
    COMPUTER METHODS FOR MACROMOLECULAR SEQUENCE ANALYSIS, 1996, 266 : 212 - 227
  • [50] Large-scale homologous analysts of genome sequence
    Tang, HX
    Ding, DF
    ACTA BIOCHIMICA ET BIOPHYSICA SINICA, 1996, 28 (06): : 686 - 693