Identification of related gene/protein names based on an HMM of name variations

被引:15
|
作者
Yeganova, L [1 ]
Smith, L [1 ]
Wilbur, WJ [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, Computat Biol Branch, NIH, Bethesda, MD 20894 USA
关键词
automatic term recognition; gene name variation; hidden Markov model; information extraction;
D O I
10.1016/j.compbiolchem.2003.12.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR), We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model. (C) 2003 Published by Elsevier Ltd.
引用
收藏
页码:97 / 107
页数:11
相关论文
共 50 条
  • [21] On-line Arabic Handwritten Personal Names Recognition System based on HMM
    Abdelazeem, Sherif
    Eraqi, Hesham M.
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1304 - 1308
  • [22] Tagging gene and protein names in biomedical text
    Tanabe, L
    Wilbur, WJ
    BIOINFORMATICS, 2002, 18 (08) : 1124 - 1132
  • [23] Identification of Divergent Protein Domains by Combining HMM-HMM Comparisons and Co-Occurrence Detection
    Ghouila, Amel
    Florent, Isabelle
    Guerfali, Fatma Zahra
    Terrapon, Nicolas
    Laouini, Dhafer
    Ben Yahia, Sadok
    Gascuel, Olivier
    Brehelin, Laurent
    PLOS ONE, 2014, 9 (06):
  • [24] Using HMM based recognizers for writer identification and verification
    Schlapbach, A
    Bunke, H
    NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 167 - 172
  • [25] Hybrid architecture based on HMM/MLFNN for speaker identification
    Beijing Daxue Xuebao Ziran Kexue Ban, 3 (359-367):
  • [26] Extraction of gene/protein names involved in each stage of spermatogenesis based on literature mining
    Zhu, Jun
    Yin, Jianping
    Zhao, Zhiheng
    Zhu, En
    Ban, Rongjun
    Zhu, J. (cqzhujun@126.com), 1600, Science Press (51): : 1352 - 1358
  • [27] Identification of the gene variations in human IKKA
    Hagiwara, K
    Tsuchiya, N
    Takazoe, M
    Yamamoto, K
    Tokunaga, K
    IMMUNOGENETICS, 1999, 50 (5-6) : 363 - 365
  • [28] Identification of the gene variations in human IKKA
    K. Hagiwara
    N. Tsuchiya
    M. Takazoe
    K. Yamamoto
    K. Tokunaga
    Immunogenetics, 1999, 50 : 363 - 365
  • [29] Arrhythmogenic cardiomyopathy: Identification of desmosomal gene variations and desmosomal protein expression in variation carriers
    Wang, Li
    Liu, Shenghua
    Zhang, Hongliang
    Hu, Shengshou
    Wei, Yingjie
    EXPERIMENTAL AND THERAPEUTIC MEDICINE, 2018, 15 (03) : 2255 - 2262
  • [30] Tagging Gene and Protein Names in Full Text Articles
    National Center for Biotechnology Information, NLM, NIH, Bethesda
    MD
    20894, United States
    Proc. Annu. Meet. Assoc. Comput Linguist., (9-13):