A database of unique protein sequence identifiers for proteome studies

被引:18
|
作者
Babnigg, Gyorgy [1 ]
Giometti, Carol S. [1 ]
机构
[1] Argonne Natl Lab, Div Biosci, Prot Mapping Grp, Argonne, IL 60439 USA
关键词
protein sequence identification; SEGUID database;
D O I
10.1002/pmic.200600032
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
in proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/ SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, M-r) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.
引用
收藏
页码:4514 / 4522
页数:9
相关论文
共 50 条
  • [1] Building a national perinatal database without the use of unique personal identifiers
    Schnell, Rainer
    Borgs, Christian
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 232 - 239
  • [2] Toward unique identifiers
    Paskin, N
    PROCEEDINGS OF THE IEEE, 1999, 87 (07) : 1208 - 1227
  • [3] Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers
    Philippe E Thomas
    Roman Klinger
    Laura I Furlong
    Martin Hofmann-Apitius
    Christoph M Friedrich
    BMC Bioinformatics, 12
  • [4] Unique Identifiers for Authors
    Maunsell, John H. R.
    JOURNAL OF NEUROSCIENCE, 2014, 34 (21): : 7043 - 7043
  • [5] Traceability and unique identifiers
    Distler, P.
    STATE OF THE ART PRESENTATIONS, 2014, 9 (01): : 98 - 103
  • [6] Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers
    Thomas, Philippe E.
    Klinger, Roman
    Furlong, Laura I.
    Hofmann-Apitius, Martin
    Friedrich, Christoph M.
    BMC BIOINFORMATICS, 2011, 12
  • [7] A machine learning-based system to normalise gene mentions to unique database identifiers
    Chen, Yifei
    Liu, Feng
    Manderick, Bernard
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2011, 5 (06) : 640 - 660
  • [8] Unique patient identifiers: An overview
    Appavu, SI
    TOWARD AN ELECTRONIC PATIENT RECORD '97 - CONFERENCE AND EXPOSITION, PROCEEDINGS, VOLS 1-3, 1997, : C33 - C36
  • [9] Evaluating the automatic mapping of human gene and protein mentions to unique identifiers
    Morgan, Alexander A.
    Wellner, Benjamin
    Colombe, Jeffrey B.
    Arens, Robert
    Colosimo, Marc E.
    Hirschman, Lynette
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2007, 2007, : 281 - +
  • [10] Line notations as unique identifiers
    Boda, Krisztina
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2010, 240