Computational Identification of piRNAs Using Features Based on RNA Sequence, Structure, Thermodynamic and Physicochemical Properties

被引:19
|
作者
Monga, Isha [1 ]
Banerjee, Indranil [1 ]
机构
[1] Indian Inst Sci Educ & Res Mohali IISER Mohali, Cellular Virol Lab, Dept Biol Sci, Sect 81, Sas Nagar 140306, Mohali, India
关键词
piRNA; classification; algorithm; prediction; non-coding RNA; physicochemical; PIWI-INTERACTING RNAS; MESSENGER-RNAS; BIOGENESIS; SIRNAS; PREDICTION; PROTEINS;
D O I
10.2174/1389202920666191129112705
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Rationale: PIWI-interacting RNAs (piRNAs) are a recently-discovered class of small noncoding RNAs (ncRNAs) with a length of 21-35 nucleotides. They play a role in gene expression regulation, transposon silencing, and viral infection inhibition. Once considered as "dark matter" of ncRNAs, piRNAs emerged as important players in multiple cellular functions in different organisms. However, our knowledge of pi RNAs is still very limited as many pi RNAs have not been yet identified due to lack of robust computational predictive tools. Methods: To identify novel piRNAs, we developed piRNAPred, an integrated framework for piRNA prediction employing hybrid features like k-mer nucleotide composition, secondary structure, thermodynamic and physicochemical properties. A non-redundant dataset (D-3349 or D1684p+1665n ) comprising 1684 experimentally verified piRNAs and 1665 non-piRNA sequences was obtained from piRBase and NONCODE, respectively. These sequences were subjected to the computation of various sequence-structure based features in binary format and trained using different machine learning techniques, of which support vector machine (SVM) performed the best. Results: During the ten-fold cross-validation approach (10-CV), piRNAPred achieved an overall accuracy of 98.60% with Mathews correlation coefficient (MCC) of 0.97 and receiver operating characteristic (ROC) of 0.99. Furthermore, we achieved a dimensionality reduction of feature space using an attribute selected classifier. Conclusion: We obtained the highest performance in accurately predicting piRNAs as compared to the current state-of-the-art piRNA predictors. In conclusion, piRNAPred would be helpful to expand the piRNA repertoire, and provide new insights on piRNA functions.
引用
收藏
页码:508 / 518
页数:11
相关论文
共 50 条
  • [31] Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure
    Mathews, DH
    Sabina, J
    Zuker, M
    Turner, DH
    JOURNAL OF MOLECULAR BIOLOGY, 1999, 288 (05) : 911 - 940
  • [32] THERMODYNAMIC AND KINETIC-PROPERTIES OF SHORT RNA HELICES - OLIGOMER SEQUENCE ANGCUN
    RAVETCH, J
    GRALLA, J
    CROTHERS, DM
    NUCLEIC ACIDS RESEARCH, 1974, 1 (01) : 109 - 127
  • [33] Computational Assessment of Donor HLA Amino Acid Sequence, Structure, and Physicochemical Properties Enables Prediction of Their Capacity to Induce Humoral Alloimmunity
    Elango, Madhivanan
    Copley, Hannah Charlotte
    Kosmoliaptsis, Vasilis
    JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2019, 229 (04) : S284 - S284
  • [34] Capturing protein sequence-structure specificity using computational sequence design
    Mach, Paul
    Koehl, Patrice
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2013, 81 (09) : 1556 - 1570
  • [35] Identification of mine and UXO target features using computational electromagnetics
    Sullivan, A
    Sichina, J
    Nguyen, L
    DETECTION AND REMEDIATION TECHNOLOGIES FOR MINES AND MINELIKE TARGETS VI, PTS 1 AND 2, 2001, 4394 : 31 - 42
  • [36] MicroRNA identification based on sequence and structure alignment
    Wang, XW
    Zhang, J
    Gu, J
    He, T
    Zhang, XG
    Li, YD
    Li, F
    BIOINFORMATICS, 2005, 21 (18) : 3610 - 3614
  • [37] Physicochemical properties of drug-like fluids using thermodynamic models
    Akbari, Falamarz
    Farhadi, Mitra
    PHYSICS AND CHEMISTRY OF LIQUIDS, 2022, 60 (01) : 95 - 110
  • [38] iLMS, Computational Identification of Lysine-Malonylation Sites by Combining Multiple Sequence Features
    Hasan, Md Mehedi
    Kurata, Hiroyuki
    PROCEEDINGS 2018 IEEE 18TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2018, : 356 - 359
  • [39] Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles
    Gautheret, D
    Lambert, A
    JOURNAL OF MOLECULAR BIOLOGY, 2001, 313 (05) : 1003 - 1011
  • [40] Prediction of protease substrates using sequence and structure features
    Barkan, David T.
    Hostetter, Daniel R.
    Mahrus, Sami
    Pieper, Ursula
    Wells, James A.
    Craik, Charles S.
    Sali, Andrej
    BIOINFORMATICS, 2010, 26 (14) : 1714 - 1722