PASQUAL: Parallel Techniques for Next Generation Genome Sequence Assembly

被引:11
|
作者
Liu, Xing [1 ]
Pande, Pushkar R. [2 ]
Meyerhenke, Henning [3 ]
Bader, David A. [4 ]
机构
[1] Georgia Inst Technol, Sch Computat Sci & Engn, Coll Comp, Atlanta, GA 30309 USA
[2] Georgia Inst Technol, Coll Comp, Sch Computat Sci & Engn, Newark, CA 94560 USA
[3] Karlsruhe Inst Technol KIT, Inst Theoret Informat, D-76128 Karlsruhe, Germany
[4] Georgia Inst Technol, Sch Computat Sci & Engn, Coll Comp, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Parallel algorithms; de novo sequence assembly; parallel suffix array construction; shared memory parallelism; high-performance bioinformatics; ALGORITHMS; MILLIONS; GRAPHS; READS;
D O I
10.1109/TPDS.2012.190
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The study of genomes has been revolutionized by sequencing machines that output many short overlapping substrings (called reads). The task of sequence assembly in practice is to reconstruct long contiguous genome subsequences from the reads. With Next Generation Sequencing (NGS) technologies, assembly software needs to be more accurate, faster, and more memory-efficient due to the problem complexity and the size of the data sets. In this paper, we develop parallel algorithms and compressed data structures to address several computational challenges of NGS assembly. We demonstrate how commonly available multicore architectures can be efficiently utilized for sequence assembly. In all stages (indexing input strings, string graph construction and simplification, extraction of contiguous subsequences) of our software PASQUAL, we use shared-memory parallelism to speed up the assembly process. In our experiments with data of up to 6.8 billion base pairs, we demonstrate that PASQUAL generally delivers the best tradeoff between speed, memory consumption, and solution quality. On synthetic and real data sets PASQUAL scales well on our test machine with 40 CPU cores with increasing number of threads. Given enough cores, PASQUAL is fastest in our comparison.
引用
收藏
页码:977 / 986
页数:10
相关论文
共 50 条
  • [1] Limitations of next-generation genome sequence assembly
    Alkan C.
    Sajjadian S.
    Eichler E.E.
    Nature Methods, 2011, 8 (1) : 61 - 65
  • [2] Limitations of next-generation genome sequence assembly
    Alkan, Can
    Sajjadian, Saba
    Eichler, Evan E.
    NATURE METHODS, 2011, 8 (01) : 61 - 65
  • [3] A next-generation human genome sequence
    Church, Deanna M.
    SCIENCE, 2022, 376 (6588) : 34 - 35
  • [4] ASL 4: Next Generation Complex Genome Assembly
    Baruch, Kobi
    Barad, Omer
    Ben Zvi, Gil
    Ronen, Gil
    6TH INTERNATIONAL SYMPOSIUM BREEDING RESEARCH ON MEDICINAL AND AROMATIC PLANTS (BREEDMAP 6), 2016, 453 : 18 - 20
  • [5] Tablet-next generation sequence assembly visualization
    Milne, Iain
    Bayer, Micha
    Cardle, Linda
    Shaw, Paul
    Stephen, Gordon
    Wright, Frank
    Marshall, David
    BIOINFORMATICS, 2010, 26 (03) : 401 - 402
  • [6] Next generation sequencing under de novo genome assembly
    Nimmy, Sonia Farhana
    Kamal, M. S.
    INTERNATIONAL JOURNAL OF BIOMATHEMATICS, 2015, 8 (05)
  • [7] The Genome Assembly Model for Next-Generation Sequencing Data
    Wang, Yirong
    Wei, Chengdong
    Zhang, Xiaodong
    Cen, Tailin
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON APPLIED MATHEMATICS, MODELLING AND STATISTICS APPLICATION (AMMSA 2017), 2017, 141 : 97 - 101
  • [8] A GENOME ASSEMBLY PLATFORM FOR NEXT-GENERATION SEQUENCING TECHNOLOGY
    Lu Wenwen
    Lu Zhiyuan
    Wang Yaxu
    Sun Xiao
    IFPT'6: PROGRESS ON POST-GENOME TECHNOLOGIES, PROCEEDINGS, 2009, : 166 - 167
  • [10] Next generation shotgun sequencing and the challenges of de novo genome assembly
    Schlebusch, Stephen
    Illing, Nicola
    SOUTH AFRICAN JOURNAL OF SCIENCE, 2012, 108 (11-12) : 37 - 44