Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers

被引:4
|
作者
Orenstein, Yaron [1 ]
Berger, Bonnie [1 ,2 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] MIT, Dept Math, 77 Mass Ave,2-373, Cambridge, MA 02139 USA
基金
美国国家卫生研究院;
关键词
de Bruijn graph; microarray library design; RNA secondary structure; UNIVERSAL DNA MICROARRAYS; BINDING-SITES; RECOGNITION; SEQUENCES;
D O I
10.1089/cmb.2015.0179
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Current microarray technologies to determine RNA structure or measure protein-RNA interactions rely on single-stranded, unstructured RNA probes on a chip covering together all k-mers. Since space on the array is limited, the problem is to efficiently design a compact library of unstructured l-long RNA probes, where each k-mer is covered at least p times. Ray et al. designed such a library for specific values of k, l, and p using ad-hoc rules. To our knowledge, there is no general method to date to solve this problem. Here, we address the problem of finding a minimum-size covering of all k-mers by l-long sequences with the desired properties for any value of k, l, and p. As we prove that the problem is NP-hard, we give two solutions: the first is a greedy algorithm with a logarithmic approximation ratio; the second, a heuristic greedy approach based on random walks in de Bruijn graphs. The heuristic algorithm works well in practice and produces a library of unstructured RNA probes that is only similar to 1.1-times greater in size compared to the theoretical lower bound. We present results for typical values of k and probe lengths l and show that our algorithm generates a library that is significantly smaller than the library of Ray et al.; moreover, we show that our algorithm outperforms naive methods. Our approach can be generalized and extended to generate RNA or DNA oligo libraries with other desired properties. The software is freely available online.
引用
收藏
页码:67 / 79
页数:13
相关论文
共 28 条
  • [1] Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers
    Orenstein, Yaron
    Berger, Bonnie
    ALGORITHMS IN BIOINFORMATICS (WABI 2015), 2015, 9289 : 308 - 325
  • [2] Kcollections: A Fast and Efficient Library for K-mers
    Fujimoto, M. Stanley
    Lyman, Cole A.
    Clement, Mark J.
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 193 - 198
  • [3] Joker de Bruijn: Sequence Libraries to Cover All k-mers Using Joker Characters
    Orenstein, Yaron
    Kim, Ryan
    Fordyce, Polly
    Berger, Bonnie
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2017, 2017, 10229 : 389 - 390
  • [4] Random sequential covering of a one-dimensional lattice by k-mers
    Viot, Pascal
    Krapivsky, P. L.
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2025, 2025 (01):
  • [5] Joker de Bruijn: Covering k-Mers Using Joker Characters
    Orenstein, Yaron
    Yu, Yun William
    Berger, Bonnie
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2018, 25 (11) : 1171 - 1178
  • [6] BLight: efficient exact associative structure for k-mers
    Marchet, Camille
    Kerbiriou, Mael
    Limasset, Antoine
    BIOINFORMATICS, 2021, 37 (18) : 2858 - 2865
  • [7] Not all K-Mers are Equal - Some are Interesting, Some are Boring
    Kaplinski, Lauris
    Remm, Maido
    HUMAN HEREDITY, 2016, 81 (04) : 233 - 234
  • [8] Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers
    Orenstein, Yaron
    Shamir, Ron
    BIOINFORMATICS, 2013, 29 (13) : 71 - 79
  • [9] Efficient counting of k-mers in DNA sequences using a bloom filter
    Melsted, Pall
    Pritchard, Jonathan K.
    BMC BIOINFORMATICS, 2011, 12
  • [10] Turtle: Identifying frequent k-mers with cache-efficient algorithms
    Roy, Rajat Shuvro
    Bhattacharya, Debashish
    Schliep, Alexander
    BIOINFORMATICS, 2014, 30 (14) : 1950 - 1957