Identification of related gene/protein names based on an HMM of name variations

被引:15
|
作者
Yeganova, L [1 ]
Smith, L [1 ]
Wilbur, WJ [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, Computat Biol Branch, NIH, Bethesda, MD 20894 USA
关键词
automatic term recognition; gene name variation; hidden Markov model; information extraction;
D O I
10.1016/j.compbiolchem.2003.12.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR), We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model. (C) 2003 Published by Elsevier Ltd.
引用
收藏
页码:97 / 107
页数:11
相关论文
共 50 条
  • [1] NAME-CALLING + GENDER VARIATIONS OF PROPER-NAMES
    HOWARD, HM
    VERBATIM, 1995, 21 (03): : 20 - 20
  • [2] A probabilistic model for identifying protein names and their name boundaries
    Seki, K
    Mostafa, J
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 251 - 258
  • [3] What's in a Name? Changing Names and Challenges to Professional Identification
    Alber, Julia
    Chaney, Don
    O'Rourke, Thomas W.
    AMERICAN JOURNAL OF HEALTH EDUCATION, 2013, 44 (05) : 288 - 291
  • [4] Language independent first and last name identification in person names
    Popescu, Octavian
    Magnini, Bernardo
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 322 - 333
  • [5] BioThesaurus: a web-based thesaurus of protein and gene names
    Liu, HF
    Hu, ZZ
    Zhang, J
    Wu, C
    BIOINFORMATICS, 2006, 22 (01) : 103 - 105
  • [6] Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification
    Schuemie, Martijn J.
    Mons, Barend
    Weeber, Marc
    Kors, Jan A.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2007, 40 (03) : 316 - 324
  • [7] Identification of cancer-related module in protein-protein interaction network based on gene prioritization
    Wu, Jingli
    Zhang, Qi
    Li, Gaoshi
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2022, 20 (01)
  • [8] Species identification for gene name normalization
    Solt, Illes
    Tikk, Domonkos
    Leser, Ulf
    BMC BIOINFORMATICS, 2010, 11
  • [9] Labeling faces with names based on the name semantic network
    Xueping Su
    Jinye Peng
    Xiaoyi Feng
    Jun Wu
    Multimedia Tools and Applications, 2016, 75 : 6445 - 6462
  • [10] Species identification for gene name normalization
    Illés Solt
    Domonkos Tikk
    Ulf Leser
    BMC Bioinformatics, 11