Deep learning for peptide identification from metaproteomics datasets

被引:9
|
作者
Feng, Shichao [1 ]
Sterzenbach, Ryan [2 ]
Guo, Xuan [1 ]
机构
[1] Univ North Texas, Dept Comp Sci & Engn, 3940 N Elm St,Ste F290, Denton, TX 76207 USA
[2] Univ North Texas, Dept Biomed Engn, Denton, TX 76203 USA
基金
美国国家卫生研究院;
关键词
Peptide identification; Deep learning; Tandem mass spectrometry; CNN; PROTEIN IDENTIFICATION; STATISTICAL-MODEL; MS/MS; CONFIDENCE; CHALLENGES; REVEALS;
D O I
10.1016/j.jprot.2021.104316
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Metaproteomics is becoming widely used in microbiome research for gaining insights into the functional state of the microbial community. Current metaproteomics studies are generally based on high-throughput tandem mass spectrometry (MS/MS) coupled with liquid chromatography. In this paper, we proposed a deep-learningbased algorithm, named DeepFilter, for improving peptide identifications from a collection of tandem mass spectra. The key advantage of the DeepFilter is that it does not need ad hoc training or fine-tuning as in existing filtering tools. DeepFilter is freely available under the GNU GPL license at https://github. com/Biocomputing-Research-Group/DeepFilter. Significance: The identification of peptides and proteins from MS data involves the computational procedure of searching MS/MS spectra against a predefined protein sequence database and assigning top-scored peptides to spectra. Existing computational tools are still far from being able to extract all the information out of MS/MS data sets acquired from metaproteome samples. Systematical experiment results demonstrate that the DeepFilter identified up to 12% and 9% more peptide-spectrum-matches and proteins, respectively, compared with existing filtering algorithms, including Percolator, Q-ranker, PeptideProphet, and iProphet, on marine and soil microbial metaproteome samples with false discovery rate at 1%. The taxonomic analysis shows that DeepFilter found up to 7%, 10%, and 14% more species from marine, soil, and human gut samples compared with existing filtering algorithms. Therefore, DeepFilter was believed to generalize properly to new, previously unseen peptidespectrum-matches and can be readily applied in peptide identification from metaproteomics data.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Scaling Down Deep Learning Architectures for Medical Datasets
    Valdes, G.
    Interian, Y.
    Morin, O.
    Arbelo, W.
    RADIOTHERAPY AND ONCOLOGY, 2021, 161 : S1516 - S1517
  • [32] Benchmarking deep learning models on large healthcare datasets
    Purushotham, Sanjay
    Meng, Chuizheng
    Che, Zhengping
    Liu, Yan
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 83 : 112 - 134
  • [33] Deep Learning Applied to Imbalanced Malware Datasets Classification
    Salas, Marcelo Palma
    de Geus, Paulo Licio
    JOURNAL OF INTERNET SERVICES AND APPLICATIONS, 2024, 15 (01) : 342 - 359
  • [34] A Systematic Collection of Medical Image Datasets for Deep Learning
    Li, Johann
    Zhu, Guangming
    Hua, Cong
    Feng, Mingtao
    Bennamoun, Basheer
    Li, Ping
    Lu, Xiaoyuan
    Song, Juan
    Shen, Peiyi
    Xu, Xu
    Mei, Lin
    Zhang, Liang
    Shah, Syed Afaq Ali
    Bennamoun, Mohammed
    ACM COMPUTING SURVEYS, 2024, 56 (05)
  • [35] Ensemble Deep Learning on Wearables Using Small Datasets
    Mauldin T.
    Ngu A.H.
    Metsis V.
    Canby M.E.
    ACM Transactions on Computing for Healthcare, 2021, 2 (01):
  • [36] Assessment of Deep Learning for Gender Classification on Traditional Datasets
    Del Coco, Marco
    Carcagni, Pierluigi
    Leo, Marco
    Mazzeo, Pier Luigi
    Spagnolo, Paolo
    Distante, Cosimo
    2016 13TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2016, : 271 - 277
  • [37] Clustering of mixed datasets using deep learning algorithm
    Balaji, K.
    Lavanya, K.
    Mary, A. Geetha
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2020, 204
  • [38] Generalization Error Bounds on Deep Learning with Markov Datasets
    Truong, Lan V.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [39] Deep learning with small datasets: using autoencoders to address limited datasets in construction management
    Delgado, Juan Manuel Davila
    Oyedele, Lukumon
    APPLIED SOFT COMPUTING, 2021, 112
  • [40] Deep learning for the rapid automatic segmentation of forearm muscle boundaries from ultrasound datasets
    Xin, Chen
    Li, Baoxu
    Wang, Dezheng
    Chen, Wei
    Yue, Shouwei
    Meng, Dong
    Qiao, Xu
    Zhang, Yang
    FRONTIERS IN PHYSIOLOGY, 2023, 14