MulStack: An ensemble learning prediction model of multilabel mRNA subcellular localization

被引:2
|
作者
Liu Z. [1 ]
Bai T. [2 ,3 ,4 ]
Liu B. [3 ,4 ]
Yu L. [1 ]
机构
[1] School of Computer Science and Technology, Xidian University, Xian
[2] School of Mathematics & Computer Science, Yan'an University, Shaanxi
[3] School of Computer Science and Technology, Beijing Institute of Technology, Beijing
[4] Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing
基金
中国国家自然科学基金;
关键词
Deep learning; Ensemble learning predictor; mRNA features at two levels; Multilabel mRNA subcellular localization; Position encoding;
D O I
10.1016/j.compbiomed.2024.108289
中图分类号
学科分类号
摘要
Subcellular localization of mRNA is related to protein synthesis, cell polarity, cell movement and other biological regulation mechanisms. The distribution of mRNAs in subcellulars is similar to that of proteins, and most mRNAs are distributed in multiple subcellulars. Recently, some computational methods have been designed to predict the subcellular localization of mRNA. However, these methods only employed a sin-gle level of mRNA features and did not employ the position encoding of nucleotides in mRNA. In this paper, an ensemble learning prediction model is proposed, named MulStack, which is based on random forest and deep learning for multilabel mRNA subcellular localization. The proposed method employs two levels of mRNA features, including sequence-level and residue-level features, and position encoding is employed for the first time in the field of subcellular localization of mRNA. Random forest is employed to learn mRNA sequence-level feature, deep learning is employed to learn mRNA sequence-level feature and mRNA residue-level combined with position encoding. And the outputs of random forest and deep learning model will be weighted sum as the prediction probability. Compared with existing methods, the results show that MulStack is the best in the localization of the nucleus, cytosol and exosome. In addition, position weight matrices (PWMs) are extracted by convolutional neural networks (CNNs) that can be matched with known RNA binding protein motifs. Gene ontology (GO) enrichment analysis shows biological processes, molecular functions and cellular components of mRNA genes. The prediction web server of MulStack is freely accessible at http://bliulab.net/MulStack. © 2024 Elsevier Ltd
引用
收藏
相关论文
共 50 条
  • [1] Multilabel Learning for Protein Subcellular Location Prediction
    Li, Guo-Zheng
    Wang, Xiao
    Hu, Xiaohua
    Liu, Jia-Ming
    Zhao, Rui-Wei
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2012, 11 (03) : 237 - 243
  • [2] Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization
    Wattanapornprom, Warin
    Thammarongtham, Chinae
    Hongsthong, Apiradee
    Lertampaiporn, Supatcha
    LIFE-BASEL, 2021, 11 (04):
  • [3] SubLocEP: a novel ensemble predictor of subcellular localization of eukaryotic mRNA based on machine learning
    Li, Jing
    Zhang, Lichao
    He, Shida
    Guo, Fei
    Zou, Quan
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [4] Student Performance Prediction with Optimum Multilabel Ensemble Model
    Yekun, Ephrem Admasu
    Haile, Abrahaley Teklay
    JOURNAL OF INTELLIGENT SYSTEMS, 2021, 30 (01) : 511 - 523
  • [5] Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features
    Qiao, Shanping
    Yan, Baoqiang
    Li, Jing
    APPLIED INTELLIGENCE, 2018, 48 (07) : 1813 - 1824
  • [6] mGOF-loc: A novel ensemble learning method for human protein subcellular localization prediction
    Wei, Leyi
    Liao, Minghong
    Gao, Xing
    Wang, Jingjing
    Lin, Weiqi
    NEUROCOMPUTING, 2016, 217 : 73 - 82
  • [7] Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features
    Shanping Qiao
    Baoqiang Yan
    Jing Li
    Applied Intelligence, 2018, 48 : 1813 - 1824
  • [8] Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction
    Wang, Xiao
    Li, Guo-Zheng
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (02) : 436 - 446
  • [9] Imbalanced classification for protein subcellular localization with multilabel oversampling
    Rana, Priyanka
    Sowmya, Arcot
    Meijering, Erik
    Song, Yang
    BIOINFORMATICS, 2023, 39 (01)
  • [10] mRNALocater: Enhance the prediction accuracy of eukaryotic mRNA subcellular localization by using model fusion strategy
    Tang, Qiang
    Nie, Fulei
    Kang, Juanjuan
    Chen, Wei
    MOLECULAR THERAPY, 2021, 29 (08) : 2617 - 2623