ProLanGO2: Protein Function Prediction with Ensemble of Encoder-Decoder Networks

被引:2
|
作者
Hippe, Kyle [1 ]
Gbenro, Sola [1 ]
Cao, Renzhi [1 ]
机构
[1] Pacific Lutheran Univ, Dept Comp Sci, Tacoma, WA 98447 USA
关键词
Protein function prediction; Recurrent Neural Network; Machine learning; AUTOMATED PREDICTION; ANNOTATIONS; SEQUENCES; DATABASE;
D O I
10.1145/3388440.3414701
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Predicting protein function from protein sequence is a main challenge in the computational biology field. Traditional methods that search protein sequences against existing databases may not work well in practice, particularly when little or no homology exists in the database. We introduce the ProLanGO2 method which utilizes the natural language processing and machine learning techniques to tackle the protein function prediction problem with protein sequence as input. Our method has been benchmarked blindly in the latest Critical Assessment of protein Function Annotation algorithms (CAFA 4) experiment. There are a few changes compared to the old version of ProLanGO. First of all, the latest version of the UniProt database is used. Second, the Uniprot database is filtered by the newly created fragment sequence database FSD to prepare for the protein sequence language. Third, the Encoder-Decoder network, a model consisting of two RNNs (encoder and decoder), is used to train models on the dataset. Fourth, if no k-mers of a protein sequence exist in the FSD, we select the top ten GO terms with the highest probability in all sequences from the Uniprot database that didn't contain any k-mers in FSD, and use those ten GO terms as back up for the prediction of new protein sequence. Finally, we selected the 100 best performing models and explored all combinations of those models to select the best performance ensemble model. We benchmark those different combinations of models on CAFA 3 dataset and select three top performance ensemble models for prediction in the latest CAFA 4 experiment as CaoLab. We have also evaluated the performance of our ProLanGO2 method on 253 unseen sequences taken from the UniProt database and compared with several other protein function prediction methods, the results show that our method achieves great performance among sequence-based protein function prediction methods. Our method is available in GitHub: https://github.com/caorenzhi/ProLanGO2.git.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Pavement Roughness Prediction Based on Encoder-decoder Structure
    Guo R.
    Yu X.
    Tongji Daxue Xuebao/Journal of Tongji University, 2023, 51 (08): : 1182 - 1190
  • [22] Contextual encoder-decoder network for visual saliency prediction
    Kroner, Alexander
    Senden, Mario
    Driessens, Kurt
    Goebel, Rainer
    NEURAL NETWORKS, 2020, 129 : 261 - 270
  • [23] Unsupervised Encoder-Decoder Model for Anomaly Prediction Task
    Wu, Jinmeng
    Shu, Pengcheng
    Hong, Hanyu
    Li, Xingxun
    Ma, Lei
    Zhang, Yaozong
    Zhu, Ying
    Wang, Lei
    MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 549 - 561
  • [24] CEDNet: A cascade encoder-decoder network for dense prediction
    Zhang, Gang
    Li, Ziyi
    Tang, Chufeng
    Li, Jianmin
    Hu, Xiaolin
    PATTERN RECOGNITION, 2025, 158
  • [25] A Lightweight Encoder-Decoder Path for Deep Residual Networks
    Jin, Xin
    Xie, Yanping
    Wei, Xiu-Shen
    Zhao, Bo-Rui
    Zhang, Yongshun
    Tan, Xiaoyang
    Yu, Yang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) : 866 - 878
  • [26] Unsupervised Feature Selection using Encoder-Decoder Networks
    SharifiPour, Sasan
    Fayyazi, Hossein
    Sabokro, Mohammad
    2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [27] Comparison of Encoder-Decoder Networks for Soccer Field Segmentation
    Guimaraes, Otavio H. R.
    Maximo, Marcos R. O. A.
    Parente de Oliveira, Jose Maria
    2023 LATIN AMERICAN ROBOTICS SYMPOSIUM, LARS, 2023 BRAZILIAN SYMPOSIUM ON ROBOTICS, SBR, AND 2023 WORKSHOP ON ROBOTICS IN EDUCATION, WRE, 2023, : 496 - 501
  • [28] Pedestrian behavior prediction model with a convolutional LSTM encoder-decoder
    Chen, Kai
    Song, Xiao
    Han, Daolin
    Sun, Jinghan
    Cui, Yong
    Ren, Xiaoxiang
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 560 (560)
  • [29] A multitask encoder-decoder model for quality prediction in injection moulding
    Muaz, Muhammad
    Yu, Hanxin
    Sung, Wai Lam
    Liu, Chang
    Drescher, Benny
    JOURNAL OF MANUFACTURING PROCESSES, 2023, 103 : 238 - 247
  • [30] Pedestrian trajectory prediction using BiRNN encoder-decoder framework*
    Wu, Jiaxu
    Woo, Hanwool
    Tamura, Yusuke
    Moro, Alessandro
    Massaroli, Stefano
    Yamashita, Atsushi
    Asama, Hajime
    ADVANCED ROBOTICS, 2019, 33 (18) : 956 - 969