A computational approach for prediction of donor splice sites with improved accuracy

被引:7
|
作者
Meher, Prabina Kumar [1 ]
Sahu, Tanmaya Kumar [1 ]
Rao, A. R. [1 ]
Wahi, S. D. [1 ]
机构
[1] ICAR Indian Agr Stat Res Inst, New Delhi 110012, India
关键词
Machine learning; PreDOSS; Sequence encoding; Di-nucleotide dependency; Conditional error; FEATURES;
D O I
10.1016/j.jtbi.2016.06.013
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identification of splice sites is important due to their key role in predicting the exon-intron structure of protein coding genes. Though several approaches have been developed for the prediction of splice sites, further improvement in the prediction accuracy will help predict gene structure more accurately. This paper presents a computational approach for prediction of donor splice sites with higher accuracy. In this approach, true and false splice sites were first encoded into numeric vectors and then used as input in artificial neural network (ANN), support vector machine (SVM) and random forest (RF) for prediction. ANN and SVM were found to perform equally and better than RF, while tested on HS3D and NN269 datasets. Further, the performance of ANN, SVM and RF were analyzed by using an independent test set of 50 genes and found that the prediction accuracy of ANN was higher than that of SVM and RF. All the predictors achieved higher accuracy while compared with the existing methods like NNsplice, MEM, MDD, WMM, MM1, FSPLICE, GeneID and ASSP, using the independent test set. We have also developed an online prediction server (PreDOSS) available at http://cabgrid.res.in:8080/predoss, for prediction of donor splice sites using the proposed approach. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:285 / 294
页数:10
相关论文
共 50 条
  • [1] Prediction of donor splice sites using random forest with a new sequence encoding approach
    Prabina Kumar Meher
    Tanmaya Kumar Sahu
    Atmakuri Ramakrishna Rao
    BioData Mining, 9
  • [2] Prediction of donor splice sites using random forest with a new sequence encoding approach
    Meher, Prabina Kumar
    Sahu, Tanmaya Kumar
    Rao, Atmakuri Ramakrishna
    BIODATA MINING, 2016, 9
  • [3] Impact of RNA structure on the prediction of donor and acceptor splice sites
    Marashi, Sayed-Amir
    Eslahchi, Changiz
    Pezeshk, Hamid
    Sadeghi, Mehdi
    BMC BIOINFORMATICS, 2006, 7 (1)
  • [4] Impact of RNA structure on the prediction of donor and acceptor splice sites
    Sayed-Amir Marashi
    Changiz Eslahchi
    Hamid Pezeshk
    Mehdi Sadeghi
    BMC Bioinformatics, 7
  • [5] An approach of encoding for prediction of splice sites using SVM
    Huang, J.
    Li, T.
    Chen, K.
    Wu, J.
    BIOCHIMIE, 2006, 88 (07) : 923 - 929
  • [6] Computational prediction of efficient splice sites for trans-splicing ribozymes
    Meluzzi, Dario
    Olson, Karen E.
    Dolan, Gregory F.
    Arya, Gaurav
    Mueller, Ulrich F.
    RNA, 2012, 18 (03) : 590 - 602
  • [7] Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features
    Prabina Kumar Meher
    Tanmaya Kumar Sahu
    A. R. Rao
    S. D. Wahi
    Algorithms for Molecular Biology, 11
  • [8] Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features
    Meher, Prabina Kumar
    Sahu, Tanmaya Kumar
    Rao, A. R.
    Wahi, S. D.
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2016, 11
  • [9] DeepSplicer: An Improved Method of Splice Sites Prediction using Deep Learning
    Akpokiro, Victor
    Oluwadare, Oluwatosin
    Kalita, Jugal
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 606 - 609