Space-Efficient Indexing of Spaced Seeds for Accurate Overlap Computation of Raw Optical Mapping Data

被引:0
|
作者
Walve, Riku [1 ]
Puglisi, Simon J. [1 ]
Salmela, Leena [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki Inst Informat Technol HIIT, Helsinki 00100, Finland
基金
芬兰科学院;
关键词
Optical maps; Rmaps; spaced seeds; space efficient indexing; ALIGNMENT; MAPS;
D O I
10.1109/TCBB.2021.3085086
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A key problem in processing raw optical mapping data (Rmaps) is finding Rmaps originating from the same genomic region. These sets of related Rmaps can be used to correct errors in Rmap data, and to find overlaps between Rmaps to assemble consensus optical maps. Previous Rmap overlap aligners are computationally very expensive and do not scale to large eukaryotic data sets. We present SELKIE, an Rmap overlap aligner based on a spaced (l,k)-mer index which was pioneered in the Rmap error correction tool ELMER. Here we present a space efficient version of the index which is twice as fast as prior art while using just a quarter of the memory on a human data set. Moreover, our index can be used for filtering candidates for Rmap overlap computation, whereas ELMERI used the index only for error correction of Rmaps. By combining our filtering of Rmaps with the exhaustive, but highly accurate, algorithm of Valouev etal. (2006), SELKIE maintains or increases the accuracy of finding overlapping Rmaps on a bacterial dataset while being at least four times faster. Furthermore, for finding overlaps in a human dataset, SELKIE is up to two orders of magnitude faster than previous methods.
引用
收藏
页码:2454 / 2462
页数:9
相关论文
共 4 条
  • [1] Fast and accurate correction of optical mapping data via spaced seeds
    Salmela, Leena
    Mukherjee, Kingshuk
    Puglisi, Simon J.
    Muggli, Martin D.
    Boucher, Christina
    BIOINFORMATICS, 2020, 36 (03) : 682 - 689
  • [2] Fast and accurate correction of optical mapping data via spaced seeds (vol 36, pg 682, 2019)
    Salmela, Leena
    Mukherjee, Kingshuk
    Puglisi, Simon J.
    Muggli, Martin D.
    Boucher, Christina
    BIOINFORMATICS, 2020, 36 (09) : 2974 - 2974
  • [3] A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence
    Gotoh, Osamu
    NUCLEIC ACIDS RESEARCH, 2008, 36 (08) : 2630 - 2638
  • [4] PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data Compression
    Hurst, Aaron
    Lucani, Daniel E.
    Zhang, Qi
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (06): : 1432 - 1445