Development of a novel clustering tool for linear peptide sequences

被引:53
|
作者
Dhanda, Sandeep K. [1 ]
Vaughan, Kerrie [1 ]
Schulten, Veronique [1 ]
Grifoni, Alba [1 ]
Weiskopf, Daniela [1 ]
Sidney, John [1 ]
Peters, Bjoern [1 ,2 ]
Sette, Alessandro [1 ,2 ]
机构
[1] La Jolla Inst Allergy & Immunol, Div Vaccine Discovery, La Jolla, CA 92037 USA
[2] Univ Calif San Diego, Dept Med, San Diego, CA 92103 USA
基金
美国国家卫生研究院;
关键词
Allergy; Antigens/Peptides/Epitopes; Bioinformatics >; MHC/HLA; Viral; EPITOPE; REACTIVITY; MOLECULES; ALIGNMENT; ALLERGEN; DATABASE;
D O I
10.1111/imm.12984
中图分类号
R392 [医学免疫学]; Q939.91 [免疫学];
学科分类号
100102 ;
摘要
Epitopes identified in large-scale screens of overlapping peptides often share significant levels of sequence identity, complicating the analysis of epitope-related data. Clustering algorithms are often used to facilitate these analyses, but available methods are generally insufficient in their capacity to define biologically meaningful epitope clusters in the context of the immune response. To fulfil this need we developed an algorithm that generates epitope clusters based on representative or consensus sequences. This tool allows the user to cluster peptide sequences on the basis of a specified level of identity by selecting among three different method options. These include the 'clique method', in which all members of the cluster must share the same minimal level of identity with each other, and the 'connected graph method', in which all members of a cluster must share a defined level of identity with at least one other member of the cluster. In cases where it is not possible to define a clear consensus sequence with the connected graph method, a third option provides a novel 'cluster-breaking algorithm' for consensus sequence driven sub-clustering. Herein we demonstrate the tool's clustering performance and applicability using (i) a selection of dengue virus epitopes for the 'clique method', (ii) sets of allergen-derived peptides from related species for the 'connected graph method' and (iii) large data sets of eluted ligand, major histocompatibility complex binding and T-cell recognition data captured within the Immune Epitope Database (IEDB) with the newly developed 'cluster-breaking algorithm'. This novel clustering tool is accessible at http://tools.iedb.org/cluster2/.
引用
收藏
页码:331 / 345
页数:15
相关论文
共 50 条
  • [1] GibbsCluster: unsupervised clustering and alignment of peptide sequences
    Andreatta, Massimo
    Alvarez, Bruno
    Nielsen, Morten
    NUCLEIC ACIDS RESEARCH, 2017, 45 (W1) : W458 - W463
  • [2] MeShClust: an intelligent tool for clustering DNA sequences
    James, Benjamin T.
    Luczak, Brian B.
    Girgis, Hani Z.
    NUCLEIC ACIDS RESEARCH, 2018, 46 (14) : E83
  • [3] GRM: Generalized regression model for clustering linear sequences
    Lei, HS
    Govindaraju, V
    PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2004, : 23 - 32
  • [4] A novel hierarchical clustering algorithm for gene sequences
    Wei, Dan
    Jiang, Qingshan
    Wei, Yanjie
    Wang, Shengrui
    BMC BIOINFORMATICS, 2012, 13
  • [5] A novel hierarchical clustering algorithm for gene sequences
    Dan Wei
    Qingshan Jiang
    Yanjie Wei
    Shengrui Wang
    BMC Bioinformatics, 13
  • [6] The Gas Carburization of Linear Cellular Alloys as a Novel Alloy Development Tool
    Dial, Laura C.
    Sanders, Thomas H., Jr.
    Cochran, Joe K.
    METALLURGICAL AND MATERIALS TRANSACTIONS A-PHYSICAL METALLURGY AND MATERIALS SCIENCE, 2012, 43A (04): : 1303 - 1311
  • [7] The Gas Carburization of Linear Cellular Alloys as a Novel Alloy Development Tool
    Laura C. Dial
    Thomas H. Sanders
    Joe K. Cochran
    Metallurgical and Materials Transactions A, 2012, 43 : 1303 - 1311
  • [8] Clustering Approach As a Regional Development Tool
    Yelkikalan, Nazan
    Soylemezoglu, Ergul
    Kiray, Abdullah
    Sonmez, Rukiye
    Ezilmez, Bilal
    Altun, Melike
    8TH INTERNATIONAL STRATEGIC MANAGEMENT CONFERENCE, 2012, 58 : 503 - 513
  • [9] DNA-MC: Tool for Mapping and Clustering DNA Sequences
    Ramirez, Valeria
    Roman-Godinez, Israel
    Torres-Ramos, Sulema
    VIII LATIN AMERICAN CONFERENCE ON BIOMEDICAL ENGINEERING AND XLII NATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING, 2020, 75 : 736 - 742
  • [10] Identification of novel bioactive peptide sequences from human proteins for the development of potential therapeutics
    Edwards, Richard
    Moran, Niamh
    Devocelle, Marc
    Kiernan, Aoife
    Dunne, Eimear
    Meade, Gerardene
    Park, Stephen
    Foy, Martina
    Kenny, Dermot
    Shields, Denis
    JOURNAL OF PEPTIDE SCIENCE, 2008, 14 (08) : 165 - 165