Efficient Algorithms for Sequence Analysis with Entropic Profiles

被引:5
|
作者
Pizzi, Cinzia [1 ]
Ornamenti, Mattia [1 ]
Spangaro, Simone [1 ]
Rombo, Simona E. [2 ]
Parida, Laxmi [3 ]
机构
[1] Univ Padua, Dept Informat Engn, Via Gradenigo 6-A, I-35131 Padua, Italy
[2] Univ Palermo, Dept Math & Comp Sci, Via Archirafi 34, I-90123 Palermo, Italy
[3] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
String algorithms; sequence analysis; sequence comparison; suffix tree; suffix array; entropy; alignment free; sequence composition; COMMON SUBSTRING APPROACH; ALIGNMENT-FREE METHODS; SUFFIX TREES; MOTIFS; DISCOVERY; PATTERNS; MATCHES; WORDS; BASES; KMACS;
D O I
10.1109/TCBB.2016.2620143
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign showing that our algorithms, beside being faster, make it possible the analysis of longer sequences, even for high degrees of resolution, than state of the art algorithms.
引用
收藏
页码:117 / 128
页数:12
相关论文
共 50 条
  • [41] EFFICIENT ALGORITHMS FOR IDENTIFYING ORTHOLOGOUS SIMPLE SEQUENCE REPEATS OF DISEASE GENES
    Chen, Chienming
    Chen, Chihchia
    Shih, Tsanhuang
    Pai, Tunwen
    Hu, Chinhua
    Tzou, Wenshyong
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2010, 23 (05) : 906 - 916
  • [42] Efficient algorithms for identifying orthologous simple sequence repeats of disease genes
    Chienming Chen
    Chihchia Chen
    Tsanhuang Shih
    Tunwen Pai
    Chinhua Hu
    Wenshyong Tzou
    Journal of Systems Science and Complexity, 2010, 23 : 906 - 916
  • [43] Efficient algorithms for the discovery of DNA oligonucleotide barcodes from sequence databases
    Zahariev, M.
    Dahl, V.
    Chen, W.
    Levesque, C. A.
    MOLECULAR ECOLOGY RESOURCES, 2009, 9 : 58 - 64
  • [44] PERFORMANCE PROFILES FOR BENCHMARKING OF GLOBAL SENSITIVITY ANALYSIS ALGORITHMS
    Lucay, F. A.
    Lopez-Arenas, T.
    Sales-Cruz, M.
    Galvez, E. D.
    Cisternas, L. A.
    REVISTA MEXICANA DE INGENIERIA QUIMICA, 2020, 19 (01): : 423 - 444
  • [45] Effect of Normalization Algorithms on the Analysis of Bragg Peak Profiles
    Lechner, Anton
    Pia, Maria Grazia
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2008, 55 (06) : 3544 - 3549
  • [46] On the comparison of regulatory sequences with multiple resolution Entropic Profiles
    Matteo Comin
    Morris Antonello
    BMC Bioinformatics, 17
  • [47] On the comparison of regulatory sequences with multiple resolution Entropic Profiles
    Comin, Matteo
    Antonello, Morris
    BMC BIOINFORMATICS, 2016, 17
  • [48] Entropic gradient descent algorithms and wide flat minima*
    Pittorino, Fabrizio
    Lucibello, Carlo
    Feinauer, Christoph
    Perugini, Gabriele
    Baldassi, Carlo
    Demyanenko, Elizaveta
    Zecchina, Riccardo
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2021, 2021 (12):
  • [49] Misinformation and entropic acceleration: algorithms’ departure from life
    Shiqiao Li
    Architectural Intelligence , 2 (1):
  • [50] Evaluating Entropic Based Clustering Algorithms on Biomedical Data
    Santos, Jorge M.
    Morais, Frederico
    2013 12TH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (MICAI 2013), 2013, : 194 - 199