A database of unique protein sequence identifiers for proteome studies

被引：18

作者：

Babnigg, Gyorgy ^{[1
]}

Giometti, Carol S. ^{[1
]}

机构：

[1] Argonne Natl Lab, Div Biosci, Prot Mapping Grp, Argonne, IL 60439 USA

来源：

PROTEOMICS | 2006年 / 6卷 / 16期

关键词：

protein sequence identification; SEGUID database;

D O I：

10.1002/pmic.200600032

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

in proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/ SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, M-r) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.

引用

页码：4514 / 4522

页数：9

共 50 条

[21] Molecular biology - Unique protein database imperiled
Williams, N
SCIENCE, 1996, 272 (5264) : 946 - 946
[22] SEARCHING THE PROTEIN-SEQUENCE DATABASE
ORCUTT, BC
BARKER, WC
BULLETIN OF MATHEMATICAL BIOLOGY, 1984, 46 (04) : 545 - 552
[23] DATABASE OF PROTEIN-SEQUENCE ALIGNMENTS
BARKER, WC
GEORGE, DG
SRINIVASARAO, GY
YEH, LS
FASEB JOURNAL, 1992, 6 (01): : A348 - A348
[24] A PROTEIN-SEQUENCE STRUCTURE DATABASE
不详
NATURE, 1988, 335 (6192) : 745 - 746
[25] THE PIR PROTEIN-SEQUENCE DATABASE
BARKER, WC
GEORGE, DG
HUNT, LT
GARAVELLI, JS
NUCLEIC ACIDS RESEARCH, 1991, 19 : 2231 - 2236
[26] Generating unique identifiers for smartphones using software
Hammouri, G.
Sunar, B.
ELECTRONICS LETTERS, 2014, 50 (13) : 938 - 939
[27] UNIQUE IDENTIFIERS FOR SERIALS - ANNOTATED, COMPREHENSIVE BIBLIOGRAPHY
GROOT, EH
SERIALS LIBRARIAN, 1976, 1 (01): : 51 - 75
[28] Unique health identifiers for universal health coverage
Mills, Samuel
Lee, Jane Kim
Rassekh, Bahie Mary
Kodelja, Martina Zorko
Bae, Green
Kang, Minah
Pannarunothai, Supasit
Kijsanayotin, Boonchai
JOURNAL OF HEALTH POPULATION AND NUTRITION, 2019, 38 (Suppl 1)
[29] Mining the human proteome: Experience with the human lymphoid protein database
Hanash, SM
Teichroew, D
ELECTROPHORESIS, 1998, 19 (11) : 2004 - 2009
[30] Unique, Persistent, Resolvable: Identifiers as the Foundation of FAIR
Nick Juty
Sarala MWimalaratne
Stian SoilandReyes
John Kunze
Carole AGoble
Tim Clark
Data Intelligence, 2020, 2(Z1) (Z1) : 30 - 39+302

← 1 2 3 4 5 →