A k-mer scheme to predict piRNAs and characterize locust piRNAs

被引:105
|
作者
Zhang, Yi [1 ,2 ]
Wang, Xianhui [1 ]
Kang, Le [1 ]
机构
[1] Chinese Acad Sci, Inst Zool, State Key Lab Integrated Management Pest Insect &, Beijing 100101, Peoples R China
[2] Hebei Univ Sci & Technol, Hebei Lab Pharmaceut Mol Chem, Dept Math, Shijiazhuang 050018, Herts, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
PIWI-INTERACTING RNAS; GERMLINE; PROTEINS; BIOGENESIS; SEQUENCES; ELEGANS; PATHWAY; GENOMES; SIRNAS; TESTES;
D O I
10.1093/bioinformatics/btr016
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Identifying piwi-interacting RNAs (piRNAs) of non-model organisms is a difficult and unsolved problem because piRNAs lack conservative secondary structure motifs and sequence homology in different species. Results: In this article, a k-mer scheme is proposed to identify piRNA sequences, relying on the training sets from non-piRNA and piRNA sequences of five model species sequenced: rat, mouse, human, fruit fly and nematode. Compared with the existing 'static' scheme based on the position-specific base usage, our novel 'dynamic' algorithm performs much better with a precision of over 90% and a sensitivity of over 60%, and the precision is verified by 5-fold cross-validation in these species. To test its validity, we use the algorithm to identify piRNAs of the migratory locust based on 603 607 deep-sequenced small RNA sequences. Totally, 87 536 piRNAs of the locust are predicted, and 4426 of them matched with existing locust transposons. The transcriptional difference between solitary and gregarious locusts was described. We also revisit the position-specific base usage of piRNAs and find the conservation in the end of piRNAs. Therefore, the method we developed can be used to identify piRNAs of non-model organisms without complete genome sequences.
引用
收藏
页码:771 / 776
页数:6
相关论文
共 50 条
  • [1] On weighted k-mer dictionaries
    Giulio Ermanno Pibiri
    Algorithms for Molecular Biology, 18
  • [2] On weighted k-mer dictionaries
    Pibiri, Giulio Ermanno
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2023, 18 (01)
  • [3] k-mer Profiling for Bacterial Identification
    Bhange, Snehal V.
    Tikariha, Hitesh
    Dongre, S. S.
    Purohit, H. J.
    HELIX, 2018, 8 (05): : 4007 - 4009
  • [4] Disk compression of k-mer sets
    Rahman, Amatur
    Chikhi, Rayan
    Medvedev, Paul
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2021, 16 (01)
  • [5] Compound RNN to predict MICs using K-Mer Fingerprints and Antibiotic SMILES
    Kromer-Edwards, Cory
    Oliveira, Suely
    14TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, BCB 2023, 2023,
  • [6] k-mer approaches for biodiversity genomics
    Jenike, Katharine M.
    Campos-Dominguez, Lucia
    Bodde, Marilou
    Cerca, Jose
    Hodson, Christina N.
    Schatz, Michael C.
    Jaron, Kamil S.
    GENOME RESEARCH, 2025, 35 (02) : 219 - 230
  • [7] Efficient Techniques for k-mer Counting
    Mamun, Abdullah-Al
    Pal, Soumitra
    Rajasekaran, Sanguthevar
    2015 IEEE 5TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES (ICCABS), 2015,
  • [8] Configurational entropy in k-mer adsorption
    Romá, F
    Ramirez-Pastor, AJ
    Riccardo, JL
    LANGMUIR, 2000, 16 (24) : 9406 - 9409
  • [9] The quantum hypercube as a k-mer graph
    Becerra-Gavino, Gustavo
    Barbosa-Santillan, Liliana Ibeth
    FRONTIERS IN BIOINFORMATICS, 2024, 4
  • [10] Disk compression of k-mer sets
    Amatur Rahman
    Rayan Chikhi
    Paul Medvedev
    Algorithms for Molecular Biology, 16