A computational approach for prediction of donor splice sites with improved accuracy

被引:7
|
作者
Meher, Prabina Kumar [1 ]
Sahu, Tanmaya Kumar [1 ]
Rao, A. R. [1 ]
Wahi, S. D. [1 ]
机构
[1] ICAR Indian Agr Stat Res Inst, New Delhi 110012, India
关键词
Machine learning; PreDOSS; Sequence encoding; Di-nucleotide dependency; Conditional error; FEATURES;
D O I
10.1016/j.jtbi.2016.06.013
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identification of splice sites is important due to their key role in predicting the exon-intron structure of protein coding genes. Though several approaches have been developed for the prediction of splice sites, further improvement in the prediction accuracy will help predict gene structure more accurately. This paper presents a computational approach for prediction of donor splice sites with higher accuracy. In this approach, true and false splice sites were first encoded into numeric vectors and then used as input in artificial neural network (ANN), support vector machine (SVM) and random forest (RF) for prediction. ANN and SVM were found to perform equally and better than RF, while tested on HS3D and NN269 datasets. Further, the performance of ANN, SVM and RF were analyzed by using an independent test set of 50 genes and found that the prediction accuracy of ANN was higher than that of SVM and RF. All the predictors achieved higher accuracy while compared with the existing methods like NNsplice, MEM, MDD, WMM, MM1, FSPLICE, GeneID and ASSP, using the independent test set. We have also developed an online prediction server (PreDOSS) available at http://cabgrid.res.in:8080/predoss, for prediction of donor splice sites using the proposed approach. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:285 / 294
页数:10
相关论文
共 50 条
  • [31] DeepDSSR: Deep Learning Structure for Human Donor Splice Sites Recognition
    Alam, Tanvir
    Islam, Mohammad Tariqul
    Househ, Mowafa
    Bouzerdoum, Abdesselam
    Kawsar, Ferdaus Ahmed
    HEALTH INFORMATICS VISION: FROM DATA VIA INFORMATION TO KNOWLEDGE, 2019, 262 : 236 - 239
  • [32] BKV SPLICE SEQUENCES BASED ON ANALYSIS OF PREFERRED DONOR AND ACCEPTOR SITES
    SEIF, I
    KHOURY, G
    DHAR, R
    NUCLEIC ACIDS RESEARCH, 1979, 6 (10) : 3387 - 3398
  • [33] Computational prediction of RNA editing sites
    Bundschuh, R
    BIOINFORMATICS, 2004, 20 (17) : 3214 - 3220
  • [34] Computational prediction of eukaryotic phosphorylation sites
    Trost, Brett
    Kusalik, Anthony
    BIOINFORMATICS, 2011, 27 (21) : 2927 - 2935
  • [35] Analysis and prediction of gene splice sites in four Aspergillus genomes
    Wang, Kai
    Ussery, David Wayne
    Brunak, Soren
    FUNGAL GENETICS AND BIOLOGY, 2009, 46 : S14 - S18
  • [36] Prediction of splice sites with dependency graphs and their expanded bayesian networks
    Chen, TM
    Lu, CC
    Li, WH
    BIOINFORMATICS, 2005, 21 (04) : 471 - 482
  • [37] Prediction of alternative 5′/3′ splice sites in the human genome
    Yang, Wuritu
    Li, Qian-zhong
    BMEI 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOL 1, 2008, : 143 - 147
  • [38] PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy
    Song, Jiangning
    Li, Fuyi
    Leier, Andr
    Marquez-Lago, Tatiana T.
    Akutsu, Tatsuya
    Haffari, Gholamreza
    Chou, Kuo-Chen
    Webb, Geoffrey I.
    Pike, Robert N.
    BIOINFORMATICS, 2018, 34 (04) : 684 - 687
  • [39] A high-performance approach for predicting donor splice sites based on short window size and imbalanced large samples
    Zeng, Ying
    Yuan, Hongjie
    Yuan, Zheming
    Chen, Yuan
    BIOLOGY DIRECT, 2019, 14 (1)
  • [40] Improved recognition of splice sites in A. thaliana by incorporating secondary structure information into sequence-derived features: a computational study
    Prabina Kumar Meher
    Subhrajit Satpathy
    3 Biotech, 2021, 11