Proteoform identification based on top-down tandem mass spectra with peak error corrections

被引:2
|
作者
Zhan, Zhaohui [1 ]
Wang, Lusheng [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
美国国家科学基金会;
关键词
top-down tandem mass spectra; proteoform identification; peak error correction; dynamic programming algorithms; PROTEOMICS; MS;
D O I
10.1093/bib/bbab599
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, we study the problem for finding complex proteoforms from protein databases based on top-down tandem mass spectrum data. The main difficulty to solve the problem is to handle the combinatorial explosion of various alterations on a protein. To overcome the combinatorial explosion of various alterations on a protein, the problem has been formulated as the alignment problem of a proteoform mass graph (PMG) and a spectrum mass graph (SMG). The other important issue is to handle mass errors of peaks in the input spectrum. In previous methods, an error tolerance value is used to handle the mass differences between the matched consecutive nodes/peaks in PMG and SMG. However, such a way to handle mass error can not guarantee that the mass difference between any pairs of nodes in the alignment is approximately the same for both PMG and SMG. It may lead to large error accumulation if positive (or negative) errors occur consecutively for a large number of consecutive matched node pairs. The problem is severe so that some existing software packages include a step to further refine the alignments. In this paper, we propose a new model to handle the mass errors of peaks based on the formulation of the PMG and SMG. Note that the masses of sub-paths on the PMG are theoretical and suppose to be accurate. Our method allows each peak in the input spectrum to have a predefined error range. In the alignment of PMG and SMG, we need to give a correction of the mass for each matched peak within the predefined error range. After the correction, we impose that the mass between any two (not necessarily consecutive) matched nodes in the PMG is identical to that of the corresponding two matched peaks in the SMG. Intuitively, this kind of alignment is more accurate. We design an algorithm to find a maximum number of matched node and peak pairs in the two (PMG and SMG) mass graphs under the new constraint. The obtained alignment can show matched node and peak pairs as well as the corrected positions of peaks. The algorithm works well for moderate size input instances and takes very long time as well as huge size memory for large input size instances. Therefore, we propose an algorithm to do diagonal alignment. The diagonal alignment algorithm can solve large input size instances in reasonable time. Experiments show that our new algorithms can report alignments with much larger number of matched node pairs. The software package and test data sets are available at https://github.com/Zeirdo/TopMGRefine.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Fast peak error correction algorithms for proteoform identification using top-down tandem mass spectra
    Zhan, Zhaohui
    Wang, Lusheng
    BIOINFORMATICS, 2024, 40 (04)
  • [2] A graph-based approach for proteoform identification and quantification using top-down homogeneous multiplexed tandem mass spectra
    Kaiyuan Zhu
    Xiaowen Liu
    BMC Bioinformatics, 19
  • [3] A graph-based approach for proteoform identification and quantification using top-down homogeneous multiplexed tandem mass spectra
    Zhu, Kaiyuan
    Liu, Xiaowen
    BMC BIOINFORMATICS, 2018, 19
  • [4] Proteoform characterization based on top-down mass spectrometry
    Zhong, Jiancheng
    Sun, Yusui
    Xie, Minzhu
    Peng, Wei
    Zhang, Chushu
    Wu, Fang-Xiang
    Wang, Jianxin
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (02) : 1729 - 1750
  • [5] Identification of Ultramodified Proteins Using Top-Down Tandem Mass Spectra
    Liu, Xiaowen
    Hengel, Shawna
    Wu, Si
    Tolic, Nikola
    Pasa-Tolic, Ljiljana
    Pevzner, Pavel A.
    JOURNAL OF PROTEOME RESEARCH, 2013, 12 (12) : 5830 - 5838
  • [6] Spectral probabilities of top-down tandem mass spectra
    Liu, Xiaowen
    Segar, Matthew W.
    Li, Shuai Cheng
    Kim, Sangtae
    BMC GENOMICS, 2014, 15 : 1 - 9
  • [7] Spectral probabilities of top-down tandem mass spectra
    Xiaowen Liu
    Matthew W Segar
    Shuai Cheng Li
    Sangtae Kim
    BMC Genomics, 15
  • [8] TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization
    Kou, Qiang
    Xun, Likun
    Liu, Xiaowen
    BIOINFORMATICS, 2016, 32 (22) : 3495 - 3497
  • [9] A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra
    Kou, Qiang
    Wu, Si
    Tolic, Nikola
    Pasa-Tolic, Ljiljana
    Liu, Yunlong
    Liu, Xiaowen
    BIOINFORMATICS, 2017, 33 (09) : 1309 - 1316
  • [10] Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry
    Chen, Wenrong
    Liu, Xiaowen
    JOURNAL OF PROTEOME RESEARCH, 2021, 20 (01) : 261 - 269