Efficient Algorithms for Sequence Analysis with Entropic Profiles

被引:5
|
作者
Pizzi, Cinzia [1 ]
Ornamenti, Mattia [1 ]
Spangaro, Simone [1 ]
Rombo, Simona E. [2 ]
Parida, Laxmi [3 ]
机构
[1] Univ Padua, Dept Informat Engn, Via Gradenigo 6-A, I-35131 Padua, Italy
[2] Univ Palermo, Dept Math & Comp Sci, Via Archirafi 34, I-90123 Palermo, Italy
[3] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
String algorithms; sequence analysis; sequence comparison; suffix tree; suffix array; entropy; alignment free; sequence composition; COMMON SUBSTRING APPROACH; ALIGNMENT-FREE METHODS; SUFFIX TREES; MOTIFS; DISCOVERY; PATTERNS; MATCHES; WORDS; BASES; KMACS;
D O I
10.1109/TCBB.2016.2620143
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign showing that our algorithms, beside being faster, make it possible the analysis of longer sequences, even for high degrees of resolution, than state of the art algorithms.
引用
收藏
页码:117 / 128
页数:12
相关论文
共 50 条
  • [21] Consistency of Sequence Classification with Entropic Priors
    Palmieri, Francesco A. N.
    Ciuonzo, Domenico
    BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING, 2012, 1443 : 338 - 345
  • [22] Efficient Algorithms for Touring a Sequence of Convex Polygons and Related Problems
    Tan, Xuehou
    Jiang, Bo
    THEORY AND APPLICATIONS OF MODELS OF COMPUTATION (TAMC 2017), 2017, 10185 : 613 - 626
  • [23] A comparative study of efficient algorithms for partitioning a sequence into monotone subsequences
    Yang, Bing
    Chen, Jing
    Lu, Enyue
    Zheng, S. Q.
    THEORY AND APPLICATIONS OF MODELS OF COMPUTATION, PROCEEDINGS, 2007, 4484 : 46 - +
  • [24] Efficient parameterized algorithms for biopolymer structure-sequence alignment
    Song, Yinglei
    Liu, Chunmei
    Huang, Xiuzhen
    Malmberg, Russell L.
    Xu, Ying
    Cai, Liming
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2006, 3 (04) : 423 - 432
  • [25] Efficient transaction sequence amalgamated algorithms for mining association rules
    Wang, L
    Xia, GP
    Shan, SQ
    ICIM' 2004: PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON INDUSTRIAL MANAGEMENT, 2004, : 785 - 789
  • [26] Efficient Approximation Algorithms for String Kernel Based Sequence Classification
    Farhan, Muhammad
    Tariq, Juvaria
    Zaman, Arif
    Shabbir, Mudassir
    Khan, Imdad Ullah
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [27] Efficient Algorithms for T-Way Test Sequence Generation
    Yu, Linbin
    Lei, Yu
    Kacker, Raghu N.
    Kuhn, D. Richard
    Lawrence, James
    2012 17TH INTERNATIONAL CONFERENCE ON ENGINEERING OF COMPLEX COMPUTER SYSTEMS (ICECCS), 2012, : 220 - 229
  • [28] Local Renyi entropic profiles of DNA sequences
    Susana Vinga
    Jonas S Almeida
    BMC Bioinformatics, 8
  • [29] Local Renyi entropic profiles of DNA sequences
    Vinga, Susana
    Almeida, Jonas S.
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [30] On the Efficiency of Entropic Regularized Algorithms for Optimal Transport
    Lin, Tianyi
    Ho, Nhat
    Jordan, Michael I.
    Journal of Machine Learning Research, 2022, 23