共 22 条
BERMP: a cross-species classifier for predicting m6A sites by integrating a deep learning algorithm and a random forest approach
被引:84
|作者:
Huang, Yu
[1
]
He, Ningning
[2
]
Chen, Yu
[1
]
Chen, Zhen
[2
]
Li, Lei
[1
,2
,3
,4
]
机构:
[1] Qingdao Univ, Sch Data Sci & Software Engn, Qingdao 266021, Peoples R China
[2] Qingdao Univ, Sch Basic Med, Qingdao 266021, Peoples R China
[3] Qingdao Univ, Affiliated Hosp, Canc Inst, Qingdao 266061, Shandong, Peoples R China
[4] Qingdao Canc Inst, Qingdao 266061, Shandong, Peoples R China
来源:
基金:
中国国家自然科学基金;
关键词:
Deep learning;
Recurrent neural network;
bidirectional Gated Recurrent Unit;
N-6-methyladenosine;
Random forest;
PHYSICAL-CHEMICAL PROPERTIES;
MESSENGER-RNA;
N-6-METHYLADENOSINE SITES;
ARABIDOPSIS-THALIANA;
NEURAL-NETWORKS;
WIDESPREAD;
RESOLUTION;
REVEALS;
D O I:
10.7150/ijbs.27819
中图分类号:
Q5 [生物化学];
Q7 [分子生物学];
学科分类号:
071010 ;
081704 ;
摘要:
N-6-methyladenosine (m(6)A) is a prevalent RNA methylation modification involved in several biological processes. Hundreds or thousands of m(6)A sites identified from different species using high-throughput experiments provides a rich resource to construct in-silico approaches for identifying m(6)A sites. The existing m(6)A predictors are developed using conventional machine-learning (ML) algorithms and most are species-centric. In this paper, we develop a novel cross-species deep-learning classifier based on bidirectional Gated Recurrent Unit (BGRU) for the prediction of m(6)A sites. In comparison with conventional ML approaches, BGRU achieves outstanding performance for the Mammalia dataset that contains over fifty thousand m(6)A sites but inferior for the Saccharomyces cerevisiae dataset that covers around a thousand positives. The accuracy of BGRU is sensitive to the data size and the sensitivity is compensated by the integration of a random forest classifier with a novel encoding of enhanced nucleic acid content. The integrated approach dubbed as BGRU-based Ensemble RNA Methylation site Predictor (BERMP) has competitive performance in both cross-validation test and independent test. BERMP also outperforms existing m(6)A predictors for different species. Therefore, BERMP is a novel multi-species tool for identifying m(6)A sites with high confidence.
引用
收藏
页码:1669 / 1677
页数:9
相关论文