The WM-q multiple exact string matching algorithm for DNA sequences

被引:9
|
作者
Karcioglu, Abdullah Ammar [1 ]
Bulut, Hasan [1 ]
机构
[1] Ege Univ, Dept Comp Engn, Izmir, Turkey
关键词
Multiple string matching; DNA Sequences; Sequence analysis; Hash function; Wu manber algorithm; PATTERN; SEARCH;
D O I
10.1016/j.compbiomed.2021.104656
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The string matching algorithms are among the essential fields in computer science, such as text search, intrusion detection systems, fraud detection, sequence search in bioinformatics. The exact string matching algorithms are divided into two parts: single and multiple. Multiple string matching algorithms involve finding elements of the pattern set P in a given input text T. String matching processes should be done in a time-efficient manner for DNA sequences. As the volume of the text T increases and the number of search patterns increases, the total runtime increases. Efficient algorithms should be selected to perform these search operations as soon as possible. In this study, the Wu-Manber algorithm, one of the multiple exact string matching algorithms, is improved. Although the Wu-Manber algorithm is effective, it has some limitations, such as hash collisions. In this study, the WM-q algorithm, a version of the Wu-Manber algorithm based on the perfect hash function for DNA sequences, is proposed. String matching is performed using different block lengths provided by the perfect hash function instead of using the fixed block length as in the traditional Wu-Manber algorithm. The proposed approach has been compared with E. Coli and Human Chromosome1 datasets, frequently used in the literature, using multiple exact string matching algorithms. The proposed algorithm gives better results for performance metrics such as the average runtime, the average number of characters and hash comparisons.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] The WM-q multiple exact string matching algorithm for DNA sequences
    Karcioglu, Abdullah Ammar
    Bulut, Hasan
    Computers in Biology and Medicine, 2021, 136
  • [2] q-gram hash comparison based multiple exact string matching algorithm for DNA sequences
    Karcioglu, Abdullah Ammar
    Bulut, Hasan
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2023, 38 (02): : 875 - 888
  • [3] Improving hash-q exact string matching algorithm with perfect hashing for DNA sequences
    Karcioglu, Abdullah Ammar
    Bulut, Hasan
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 131
  • [4] Fast string matching for DNA sequences
    Ryu, Cheol
    Lecroq, Thierry
    Park, Kunsoo
    THEORETICAL COMPUTER SCIENCE, 2020, 812 (137-148) : 137 - 148
  • [5] Approximate string matching in DNA sequences
    Cheng, LL
    Cheung, DW
    Yiu, SM
    EIGHTH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2003, : 303 - 310
  • [6] WM+: An optimal multi-pattern string matching algorithm based on the WM algorithm
    Chen, XX
    Fang, BX
    Li, L
    Jiang, Y
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2005, 3756 : 515 - 523
  • [7] Comparison of exact string matching algorithms for biological sequences
    Kalsi, Petri
    Peltola, Hannu
    Tarhio, Jorma
    BIOINFORMATICS RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2008, 13 : 417 - 426
  • [8] A Lightweight Multiple String Matching Algorithm
    Dai, Liuling
    Xia, Yuning
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 611 - +
  • [9] An aggressive algorithm for multiple string matching
    Dai, Liuling
    INFORMATION PROCESSING LETTERS, 2009, 109 (11) : 553 - 559
  • [10] Parallel Processing of Hybrid Exact String Matching Algorithm
    Abdulrazzaq, Atheer Akram
    Rashid, Nur'Aini Abdul
    Alezzi, Ayad Hussain Abdulkader
    2013 IEEE INTERNATIONAL CONFERENCE ON CONTROL SYSTEM, COMPUTING AND ENGINEERING (ICCSCE 2013), 2013, : 203 - +