Searching for supermaximal repeats in large DNA sequences

被引:0
|
作者
Lian, Chen Na [1 ]
Halachev, Mihail [1 ]
Shiri, Nematollaah [1 ]
机构
[1] Concordia Univ, Dept Comp Sci & Software Engn, Montreal, PQ, Canada
关键词
DNA sequences; supermaximal repeats; suffix tree; performance;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We study the problem of finding supermaximal repeats in large DNA sequences. For this, we propose an algorithm called SMR which uses an auxiliary index structure (POL), which is derived from and replaces the suffix tree index ST-FD64 [1]. The results of our numerous experiments using the 24 human chromosomes data indicate that SMR outperforms the solution provided as part of the Vmatch [2] software tool. In searching for supermaximal repeats of size at least 10 bases, SMR is twice faster than Vmatch; for a minimum length of 25 bases, SMR is 7 times faster; and for repeats of length at least 200, SMR is about 9 times faster. We also study the cost of POL in terms of time and space requirements.
引用
收藏
页码:87 / 101
页数:15
相关论文
共 50 条
  • [41] Searching microsatellites in DNA sequences: Approaches used and tools developed
    Grover A.
    Aishwarya V.
    Sharma P.C.
    Physiology and Molecular Biology of Plants, 2012, 18 (1) : 11 - 19
  • [42] Searching for unique DNA sequences with the Burrows-Wheeler Transform
    Pokrzywa, Rafal
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2008, 28 (01) : 95 - 104
  • [43] Exploring the Role of Large Tandem DNA Repeats in the Context of Regeneration
    Barreira, S. N.
    Baxevanis, A. D.
    INTEGRATIVE AND COMPARATIVE BIOLOGY, 2018, 58 : E273 - E273
  • [44] RepEx: A web server to extract sequence repeats from protein and DNA sequences
    Michael, Daliah
    Gurusaran, M.
    Santhosh, R.
    Hussain, Md. Khaja
    Satheesh, S. N.
    Suhan, S.
    Sivaranjan, P.
    Jaiswal, Akanksha
    Sekar, K.
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2019, 78 : 424 - 430
  • [45] IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences
    Alamro, Hayam
    Alzamel, Mai
    Iliopoulos, Costas S.
    Pissis, Solon P.
    Watts, Steven
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [46] Searching of Gapped Repeats and Subrepetitions in a Word
    Kolpakov, Roman
    Podolskiy, Mikhail
    Posypkin, Mikhail
    Khrapov, Nickolay
    COMBINATORIAL PATTERN MATCHING, CPM 2014, 2014, 8486 : 212 - 221
  • [47] Sequence analysis by additive scales:: DNA structure for sequences and repeats of all lengths
    Baldi, P
    Baisnée, PF
    BIOINFORMATICS, 2000, 16 (10) : 865 - 889
  • [48] IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences
    Hayam Alamro
    Mai Alzamel
    Costas S. Iliopoulos
    Solon P. Pissis
    Steven Watts
    BMC Bioinformatics, 22
  • [49] Distributions of dimeric tandem repeats in non-coding and coding DNA sequences
    Dokholyan, NV
    Buldyrev, SV
    Havlin, S
    Stanley, HE
    JOURNAL OF THEORETICAL BIOLOGY, 2000, 202 (04) : 273 - 282
  • [50] MATCH™:: a tool for searching transcription factor binding sites in DNA sequences
    Kel, AE
    Gössling, E
    Reuter, I
    Cheremushkin, E
    Kel-Margoulis, OV
    Wingender, E
    NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3576 - 3579