Computational Identification of piRNAs Using Features Based on RNA Sequence, Structure, Thermodynamic and Physicochemical Properties

被引:19
|
作者
Monga, Isha [1 ]
Banerjee, Indranil [1 ]
机构
[1] Indian Inst Sci Educ & Res Mohali IISER Mohali, Cellular Virol Lab, Dept Biol Sci, Sect 81, Sas Nagar 140306, Mohali, India
关键词
piRNA; classification; algorithm; prediction; non-coding RNA; physicochemical; PIWI-INTERACTING RNAS; MESSENGER-RNAS; BIOGENESIS; SIRNAS; PREDICTION; PROTEINS;
D O I
10.2174/1389202920666191129112705
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Rationale: PIWI-interacting RNAs (piRNAs) are a recently-discovered class of small noncoding RNAs (ncRNAs) with a length of 21-35 nucleotides. They play a role in gene expression regulation, transposon silencing, and viral infection inhibition. Once considered as "dark matter" of ncRNAs, piRNAs emerged as important players in multiple cellular functions in different organisms. However, our knowledge of pi RNAs is still very limited as many pi RNAs have not been yet identified due to lack of robust computational predictive tools. Methods: To identify novel piRNAs, we developed piRNAPred, an integrated framework for piRNA prediction employing hybrid features like k-mer nucleotide composition, secondary structure, thermodynamic and physicochemical properties. A non-redundant dataset (D-3349 or D1684p+1665n ) comprising 1684 experimentally verified piRNAs and 1665 non-piRNA sequences was obtained from piRBase and NONCODE, respectively. These sequences were subjected to the computation of various sequence-structure based features in binary format and trained using different machine learning techniques, of which support vector machine (SVM) performed the best. Results: During the ten-fold cross-validation approach (10-CV), piRNAPred achieved an overall accuracy of 98.60% with Mathews correlation coefficient (MCC) of 0.97 and receiver operating characteristic (ROC) of 0.99. Furthermore, we achieved a dimensionality reduction of feature space using an attribute selected classifier. Conclusion: We obtained the highest performance in accurately predicting piRNAs as compared to the current state-of-the-art piRNA predictors. In conclusion, piRNAPred would be helpful to expand the piRNA repertoire, and provide new insights on piRNA functions.
引用
收藏
页码:508 / 518
页数:11
相关论文
共 50 条
  • [1] NTyroSite: Computational Identification of Protein Nitrotyrosine Sites Using Sequence Evolutionary Features
    Hasan, Md. Mehedi
    Khatun, Mst. Shamima
    Mollah, Md. Nurul Haque
    Cao Yong
    Guo Dianjing
    MOLECULES, 2018, 23 (07):
  • [2] Identification of subtypes of anticancer peptides based on sequential features and physicochemical properties
    Huang, Kai-Yao
    Tseng, Yi-Jhan
    Kao, Hui-Ju
    Chen, Chia-Hung
    Yang, Hsiao-Hsiang
    Weng, Shun-Long
    SCIENTIFIC REPORTS, 2021, 11 (01) : 13594
  • [3] Identification of subtypes of anticancer peptides based on sequential features and physicochemical properties
    Kai-Yao Huang
    Yi-Jhan Tseng
    Hui-Ju Kao
    Chia-Hung Chen
    Hsiao-Hsiang Yang
    Shun-Long Weng
    Scientific Reports, 11
  • [4] THERMODYNAMIC PROPERTIES OF TRANSFER-RNA - A COMPUTATIONAL STUDY
    HIGGS, PG
    JOURNAL OF THE CHEMICAL SOCIETY-FARADAY TRANSACTIONS, 1995, 91 (16): : 2531 - 2540
  • [5] Computational identification of circular RNAs based on conformational and thermodynamic properties in the flanking introns
    Liu, Ze
    Han, Jiuqiang
    Lv, Hongqiang
    Liu, Jun
    Liu, Ruiling
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2016, 61 : 221 - 225
  • [6] Finding Potential RNA Aptamers for a Protein Target Using Sequence and Structure Features
    Lee, Wook
    Lee, Jisu
    Han, Kyungsook
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT I, 2018, 10954 : 888 - 892
  • [7] LocARNAscan: Incorporating thermodynamic stability in sequence and structure-based RNA homology search
    Sebastian Will
    Michael F Siebauer
    Steffen Heyne
    Jan Engelhardt
    Peter F Stadler
    Kristin Reiche
    Rolf Backofen
    Algorithms for Molecular Biology, 8
  • [8] LocARNAscan: Incorporating thermodynamic stability in sequence and structure-based RNA homology search
    Will, Sebastian
    Siebauer, Michael F.
    Heyne, Steffen
    Engelhardt, Jan
    Stadler, Peter F.
    Reiche, Kristin
    Backofen, Rolf
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2013, 8
  • [9] Computational Method for Classification of Avian Influenza A Virus Using DNA Sequence Information and Physicochemical Properties
    Humayun, Fahad
    Khan, Fatima
    Fawad, Nasim
    Shamas, Shazia
    Fazal, Sahar
    Khan, Abbas
    Ali, Arif
    Farhan, Ali
    Wei, Dong-Qing
    FRONTIERS IN GENETICS, 2021, 12
  • [10] Computational features evaluation for RNA secondary structure prediction
    Zhao, Yingjie
    Ni, Qingshan
    Wang, Zhengzhi
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOLS 1-4, 2009, : 1433 - 1437