IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences

被引:8
|
作者
Alamro, Hayam [1 ,2 ]
Alzamel, Mai [1 ,3 ]
Iliopoulos, Costas S. [1 ]
Pissis, Solon P. [4 ,5 ]
Watts, Steven [1 ]
机构
[1] Kings Coll London, Dept Informat, 30 Aldwych, London, England
[2] Princess Nourah bint Abdulrahman Univ, Dept Informat Syst, Riyadh, Saudi Arabia
[3] King Saud Univ, Comp Sci Dept, Riyadh, Saudi Arabia
[4] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[5] Vrije Univ Amsterdam, Amsterdam, Netherlands
基金
英国工程与自然科学研究理事会; 欧盟地平线“2020”;
关键词
Inverted repeat; Palindrome; Gaps; Mismatches; Software; IUPAC; CHROMOSOME; REGION; CRUCIFORM; XQ13;
D O I
10.1186/s12859-021-03983-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results We present IUPACpal, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS. We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.
引用
收藏
页数:12
相关论文
共 30 条
  • [1] IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences
    Hayam Alamro
    Mai Alzamel
    Costas S. Iliopoulos
    Solon P. Pissis
    Steven Watts
    BMC Bioinformatics, 22
  • [2] Efficient GPU-Accelerated Extraction of Imperfect Inverted Repeats from DNA Sequences
    Baskett, William
    Spencer, Matthew
    Shyu, Chi-Ren
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 516 - 520
  • [4] Identification of All Exact and Approximate Inverted Repeats in Regular and Weighted Sequences
    Barton, Carl
    Iliopoulos, Costas S.
    Mulder, Nicola
    Watson, Bruce
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, PT II, 2013, 384 : 11 - 19
  • [5] Efficient Search of Circular Repeats and MicroDNA Reintegration in DNA Sequences
    Wang, Yiming
    Lou, Hao
    Kumar, Pankaj
    Dutta, Anindya
    Farnoud, Farzad
    2020 IEEE 20TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2020), 2020, : 89 - 96
  • [6] Identification of repeats in DNA sequences using nucleotide distribution uniformity
    Yin, Changchuan
    JOURNAL OF THEORETICAL BIOLOGY, 2017, 412 : 138 - 145
  • [7] An Efficient Matching Algorithm for Encoded DNA Sequences and Binary Strings
    Faro, Simone
    Lecroq, Thierry
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2009, 5577 : 106 - +
  • [8] A MATLAB-based tool for accurate detection of perfect overlapping and nested inverted repeats in DNA sequences
    Sreeskandarajan, Sutharzan
    Flowers, Michelle M.
    Karro, John E.
    Liang, Chun
    BIOINFORMATICS, 2014, 30 (06) : 887 - 888
  • [9] Maize Activator transposase has a bipartite DNA binding domain that recognizes subterminal sequences and the terminal inverted repeats
    Becker, HA
    Kunze, R
    MOLECULAR & GENERAL GENETICS, 1997, 254 (03): : 219 - 230
  • [10] Maize Activator transposase has a bipartite DNA binding domain that recognizes subterminal sequences and the terminal inverted repeats
    Heinz-Albert Becker
    Reinhard Kunze
    Molecular and General Genetics MGG, 1997, 254 : 219 - 230