Efficient Algorithms for Sequence Analysis with Entropic Profiles

被引:5
|
作者
Pizzi, Cinzia [1 ]
Ornamenti, Mattia [1 ]
Spangaro, Simone [1 ]
Rombo, Simona E. [2 ]
Parida, Laxmi [3 ]
机构
[1] Univ Padua, Dept Informat Engn, Via Gradenigo 6-A, I-35131 Padua, Italy
[2] Univ Palermo, Dept Math & Comp Sci, Via Archirafi 34, I-90123 Palermo, Italy
[3] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
String algorithms; sequence analysis; sequence comparison; suffix tree; suffix array; entropy; alignment free; sequence composition; COMMON SUBSTRING APPROACH; ALIGNMENT-FREE METHODS; SUFFIX TREES; MOTIFS; DISCOVERY; PATTERNS; MATCHES; WORDS; BASES; KMACS;
D O I
10.1109/TCBB.2016.2620143
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign showing that our algorithms, beside being faster, make it possible the analysis of longer sequences, even for high degrees of resolution, than state of the art algorithms.
引用
收藏
页码:117 / 128
页数:12
相关论文
共 50 条
  • [1] EFFICIENT ALGORITHMS FOR MOLECULAR SEQUENCE-ANALYSIS
    KARLIN, S
    MORRIS, M
    GHANDOUR, G
    LEUNG, MY
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (03) : 841 - 845
  • [2] EFFICIENT SEQUENCE ALIGNMENT ALGORITHMS
    WATERMAN, MS
    JOURNAL OF THEORETICAL BIOLOGY, 1984, 108 (03) : 333 - 337
  • [3] Efficient algorithms for sequence segmentation
    Terzi, Evimaria
    Tsaparas, Panayiotis
    PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 316 - 327
  • [4] Efficient algorithms for overlapping a sequence of images
    Chang, LC
    Chung, KL
    REAL-TIME IMAGING, 2001, 7 (02) : 159 - 171
  • [5] Efficient algorithms for protein sequence design and the analysis of certain evolutionary fitness landscapes
    Kleinberg, JM
    JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) : 387 - 404
  • [6] Efficient algorithms for protein sequence design and the analysis of certain evolutionary fitness landscapes
    Cornell Univ, Ithaca, NY, United States
    Proc Annu Int Conf Comput Molecul Biol RECOMB, (226-237):
  • [7] Cancer Segmentation by Entropic Analysis of Ordered Gene Expression Profiles
    Mesa-Rodriguez, Ania
    Gonzalez, Augusto
    Estevez-Rams, Ernesto
    Valdes-Sosa, Pedro A.
    ENTROPY, 2022, 24 (12)
  • [8] Efficient algorithms for regular expression constrained sequence alignment
    Chung, Yun-Sheng
    Lu, Chin Lung
    Tang, Chuan Yi
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2006, 4009 : 389 - 400
  • [9] Flexible and Efficient Algorithms for Abelian Matching in Genome Sequence
    Faro, Simone
    Pavone, Arianna
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2019, PT I, 2019, 11465 : 307 - 318
  • [10] Time and space efficient algorithms for constrained sequence alignment
    Peng, ZS
    Ting, HF
    IMPLEMENTATION AND APPLICATION OF AUTOMATA, 2005, 3317 : 237 - 246