Determining significance of pairwise co-occurrences of events in bursty sequences

被引:14
|
作者
Haiminen, Niina [1 ]
Mannila, Heikki [1 ,2 ]
Terzi, Evimaria [3 ]
机构
[1] Univ Helsinki, Dept Comp Sci, HIIT, FIN-00014 Helsinki, Finland
[2] Aalto Univ, Lab Comp & Informat Sci, HIIT, FI-02015 Helsinki, Finland
[3] IBM Corp, Almaden Res Ctr, San Jose, CA 95120 USA
基金
芬兰科学院;
关键词
D O I
10.1186/1471-2105-9-336
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Event sequences where different types of events often occur close together arise, e. g., when studying potential transcription factor binding sites (TFBS, events) of certain transcription factors (TF, types) in a DNA sequence. These events tend to occur in bursts: in some genomic regions there are more genes and therefore potentially more binding sites, while in some, possibly very long regions, hardly any events occur. Also some types of events may occur in the sequence more often than others. Tendencies of co-occurrence of binding sites of two or more TFs are interesting, as they may imply a co-operative role between the TFs in regulatory processes. Determining a numerical value to summarize the tendency for co-occurrence between two TFs can be done in a number of ways. However, testing for the significance of such values should be done with respect to a relevant null model that takes into account the global sequence structure. Results: We extend the existing techniques that have been considered for determining the significance of co-occurrence patterns between a pair of event types under different null models. These models range from very simple ones to more complex models that take the burstiness of sequences into account. We evaluate the models and techniques on synthetic event sequences, and on real data consisting of potential transcription factor binding sites. Conclusion: We show that simple null models are poorly suited for bursty data, and they yield many false positives. More sophisticated models give better results in our experiments. We also demonstrate the effect of the window size, i.e., maximum co-occurrence distance, on the significance results.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Extended Strategies for Document Clustering with Word Co-occurrences
    Wei, Yang
    Wei, Jinmao
    Yang, Zhenglu
    WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015), 2015, 9313 : 461 - 472
  • [42] Search for Meaning Through the Study of Co-occurrences in Texts
    Bourgeois, Nicolas
    Cottrell, Marie
    Lamasse, Stephane
    Olteanu, Madalina
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, PT II, 2015, 9095 : 578 - 591
  • [43] Chinese POS tagging based on bilexical co-occurrences
    Cao, HL
    Zhao, TJ
    Li, S
    Sun, J
    Zhang, CX
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3766 - 3769
  • [44] ViCo: Word Embeddings from Visual Co-occurrences
    Gupta, Tanmay
    Schwing, Alexander
    Hoiem, Derek
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7424 - 7433
  • [45] Retrieving collocations by co-occurrences and word order constraints
    Shimohata, S
    Sugio, T
    Nagata, J
    35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, : 476 - 481
  • [46] Holistic Context Modeling using Semantic Co-occurrences
    Rasiwasia, Nikhil
    Vasconcelos, Nuno
    CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, 2009, : 1889 - 1895
  • [47] Discovery of Entailment Relations from Event Co-Occurrences
    Pekar, Viktor
    ECAI 2006, PROCEEDINGS, 2006, 141 : 516 - 520
  • [48] Distilling conceptual connections from MeSH co-occurrences
    Srinivasan, P
    Hristovski, D
    MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2, 2004, 107 : 808 - 812
  • [49] Time-efficient Significance Measure for Discovering Spatiotemporal Co-occurrences from Data with Unbalanced Characteristics
    Aydin, Berkay
    Akkineni, Vijay
    Angryk, Rafal
    23RD ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2015), 2015,
  • [50] A new measure for query disambiguation using term co-occurrences
    Wakaki, Hiromi
    Masada, Tomonari
    Takasu, Atsuhiro
    Adachi, Jun
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2006, PROCEEDINGS, 2006, 4224 : 904 - 911