Finding optimal pairs of patterns

被引:0
|
作者
Bannai, H [1 ]
Hyyrö, H
Shinohara, A
Takeda, M
Nakai, K
Miyano, S
机构
[1] Univ Tokyo, Inst Med Sci, Human Genome Ctr, Tokyo 1088639, Japan
[2] Kyushu Univ 33, Dept Informat, Fukuoka 8128581, Japan
来源
关键词
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We consider the problem of finding the optimal pair of string patterns for discriminating between two sets of strings, i.e. finding the pair of patterns that is best with respect to some appropriate scoring function that gives higher scores to pattern pairs which occur more in the strings of one set, but less in the other. We present an O(N-2) time algorithm for finding the optimal pair of substring patterns, where N is the total length of the strings. The algorithm looks for all possible Boolean combination of the patterns, e.g. patterns of the form p Lambda -q, which indicates that the pattern pair is considered to match a given string s, if p occurs in s, AND (sic)q does NOT occur in s. The same algorithm can be applied to a variant of the problem where we axe given a single set of sequences along with a numeric attribute assigned to each sequence, and the problem is to find the optimal pattern pair whose occurrence in the sequences is correlated with this numeric attribute. An efficient implementation based on suffix arrays is presented, and the algorithm is applied to several nucleotide sequence datasets of moderate size, combined with microarray gene expression data, aiming to find regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing certain genomic functions.
引用
收藏
页码:450 / 462
页数:13
相关论文
共 50 条
  • [1] Finding optimal pairs of cooperative and competing patterns with bounded distance
    Inenaga, S
    Bannai, H
    Hyyrö, H
    Shinohara, A
    Takeda, M
    Nakai, K
    Miyano, S
    DISCOVERY SCIENCE, PROCEEDINGS, 2004, 3245 : 32 - 46
  • [2] Finding optimal degenerate patterns in DNA sequences
    Shinozaki, Daisuke
    Akutsu, Tatsuya
    Maruyama, Osamu
    BIOINFORMATICS, 2003, 19 : II206 - II214
  • [3] Finding pairs in a crowded place
    Karouzos, Marios
    NATURE ASTRONOMY, 2021, 5 (09) : 868 - 868
  • [4] On finding Fermat's pairs
    Jormakka, Jorma
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2007, 10 (03): : 401 - 413
  • [5] Algorithm for finding homogeneous pairs
    Everett, H.
    Klein, S.
    Reed, B.
    Discrete Applied Mathematics, 1997, 72 (03):
  • [6] Finding pairs in a crowded place
    Marios Karouzos
    Nature Astronomy, 2021, 5 : 868 - 868
  • [7] An algorithm for finding homogeneous pairs
    Everett, H
    Klein, S
    Reed, B
    DISCRETE APPLIED MATHEMATICS, 1997, 72 (03) : 209 - 218
  • [8] Finding Shareable Informative Patterns and Optimal Coding Matrix for Multiclass Boosting
    Zhang, Bang
    Ye, Getian
    Wang, Yang
    Xu, Jie
    Herman, Gunawan
    2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 56 - 63
  • [9] Finding the Optimal Pre-set Boundaries for Pairs Trading Strategy Based on Cointegration Technique
    Puspaningrum, Heni
    Lin, Yan-Xia
    Gulati, Chandra M.
    JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2010, 4 (03) : 391 - 419
  • [10] Finding cells, finding molecules, finding patterns
    Wahlby, Carolina
    Karlsson, Patrick
    Henriksson, Sara
    Larsson, Chatarina
    Nilsson, Mats
    Bengtsson, Ewert
    ADVANCES IN MASS DATA ANALYSIS OF SIGNALS AND IMAGES IN MEDICINE BIOTECHNOLOGY AND CHEMISTRY, 2007, 4826 : 104 - +