Exact distribution of word counts in shuffled sequences

被引:2
|
作者
Rodland, EA [1 ]
机构
[1] Univ Oslo, Rikshosp, Radiumhosp HF, Ctr Mol Biol & Neurosci,Inst Med Microbiol, N-0027 Oslo, Norway
关键词
sequence shuffling; Markov chain; word count; exact distribution; hypergeometric distribution; generalised hypergeometric series; moment generating function; genome sequence analysis; directed graph; Euler path;
D O I
10.1239/aap/1143936143
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In DNA sequences, specific words may take on biological functions as marker or signalling sequences. These may often be identified by frequent-word analyses as being particularly abundant. Accurate statistics is needed to assess the statistical significance of these word frequencies. The set of shuffled sequences - letter sequences having the same k-word composition, for some choice of k, as the sequence being analysed - is considered the most appropriate sample space for analysing word counts. However, little is known about these word counts. Here we present exact formulae for word counts in shuffled sequences.
引用
收藏
页码:116 / 133
页数:18
相关论文
共 50 条
  • [31] EXACT COMPUTATION SEQUENCES
    PELIN, A
    GALLIER, JH
    LECTURE NOTES IN COMPUTER SCIENCE, 1986, 214 : 45 - 59
  • [32] A THEOREM ON EXACT SEQUENCES
    DERR, LJ
    BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1950, 56 (04) : 329 - 329
  • [33] Exact sequences of graphs
    Abbasi, Ahmad
    Ramin, Ali
    ALGEBRA AND DISCRETE MATHEMATICS, 2019, 28 (01): : 1 - 19
  • [34] A novel method for multiple alignment of sequences with repeated and shuffled elements
    Raphael, B
    Zhi, DG
    Tang, HX
    Pevzner, P
    GENOME RESEARCH, 2004, 14 (11) : 2336 - 2346
  • [35] Finite Markov Chain Embedding for the Exact Distribution of Patterns in a Set of Random Sequences
    Martin, Juliette
    Regad, Leslie
    Camproux, Anne-Claude
    Nuel, Gregory
    ADVANCES IN DATA ANALYSIS: THEORY AND APPLICATIONS TO RELIABILITY AND INFERENCE, DATA MINING, BIOINFORMATICS, LIFETIME DATA, AND NEURAL NETWORKS, 2010, : 171 - +
  • [36] Word frequency and word difficulty: A comparison of counts in four corpora
    Breland, HM
    PSYCHOLOGICAL SCIENCE, 1996, 7 (02) : 96 - 99
  • [37] APPROACHES TO THE VERB AND RUSSIAN WORD COUNTS
    MURPHY, DT
    MODERN LANGUAGE JOURNAL, 1986, 70 (03): : 258 - 262
  • [38] Spanish Word Counts: Theory and Practice
    Bull, William E.
    MODERN LANGUAGE JOURNAL, 1950, 34 (01): : 18 - 26
  • [39] From long exact sequences to spectral sequences
    Wiesend, Goetz
    NOTE DI MATEMATICA, 2006, 26 (01): : 21 - 27
  • [40] Classification of various genomic sequences based on distribution of repeated k-word
    Song, Yong-Joon
    Cho, Dong-Ho
    2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 3894 - 3897