Optical maps;
Rmaps;
spaced seeds;
space efficient indexing;
ALIGNMENT;
MAPS;
D O I:
10.1109/TCBB.2021.3085086
中图分类号:
Q5 [生物化学];
学科分类号:
071010 ;
081704 ;
摘要:
A key problem in processing raw optical mapping data (Rmaps) is finding Rmaps originating from the same genomic region. These sets of related Rmaps can be used to correct errors in Rmap data, and to find overlaps between Rmaps to assemble consensus optical maps. Previous Rmap overlap aligners are computationally very expensive and do not scale to large eukaryotic data sets. We present SELKIE, an Rmap overlap aligner based on a spaced (l,k)-mer index which was pioneered in the Rmap error correction tool ELMER. Here we present a space efficient version of the index which is twice as fast as prior art while using just a quarter of the memory on a human data set. Moreover, our index can be used for filtering candidates for Rmap overlap computation, whereas ELMERI used the index only for error correction of Rmaps. By combining our filtering of Rmaps with the exhaustive, but highly accurate, algorithm of Valouev etal. (2006), SELKIE maintains or increases the accuracy of finding overlapping Rmaps on a bacterial dataset while being at least four times faster. Furthermore, for finding overlaps in a human dataset, SELKIE is up to two orders of magnitude faster than previous methods.
机构:
Univ Helsinki, Helsinki Inst Informat Technol, Dept Comp Sci, FI-00014 Helsinki, FinlandUniv Helsinki, Helsinki Inst Informat Technol, Dept Comp Sci, FI-00014 Helsinki, Finland
Puglisi, Simon J.
Muggli, Martin D.
论文数: 0引用数: 0
h-index: 0
机构:
Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USAUniv Helsinki, Helsinki Inst Informat Technol, Dept Comp Sci, FI-00014 Helsinki, Finland
机构:
Kyoto Univ, Grad Sch Informat, Dept Intelligence Sci & Technol, Sakyo Ku, Kyoto 6068501, Japan
Natl Inst Adv Ind Sci & Technol, Computat Biol Res Ctr, Koto Ku, Tokyo 1350064, JapanKyoto Univ, Grad Sch Informat, Dept Intelligence Sci & Technol, Sakyo Ku, Kyoto 6068501, Japan