Identification of related gene/protein names based on an HMM of name variations

被引:15
|
作者
Yeganova, L [1 ]
Smith, L [1 ]
Wilbur, WJ [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, Computat Biol Branch, NIH, Bethesda, MD 20894 USA
关键词
automatic term recognition; gene name variation; hidden Markov model; information extraction;
D O I
10.1016/j.compbiolchem.2003.12.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR), We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model. (C) 2003 Published by Elsevier Ltd.
引用
收藏
页码:97 / 107
页数:11
相关论文
共 50 条
  • [31] Identifying gene and protein names from biological texts
    Xuan, WJ
    Watson, SJ
    Akil, H
    Meng, F
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 639 - 643
  • [32] A hybrid approach to protein name identification in biomedical texts
    Seki, K
    Mostafa, J
    INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (04) : 723 - 743
  • [33] A simple approach for protein name identification: prospects and limits
    Katrin Fundel
    Daniel Güttler
    Ralf Zimmer
    Joannis Apostolakis
    BMC Bioinformatics, 6
  • [34] Gene/protein name recognition based on support vector machine using dictionary as features
    Tomohiro Mitsumori
    Sevrani Fation
    Masaki Murata
    Kouichi Doi
    Hirohumi Doi
    BMC Bioinformatics, 6
  • [35] A simple approach for protein name identification:: prospects and limits
    Fundel, K
    Güttler, D
    Zimmer, R
    Apostolakis, J
    BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [36] Gene/protein name recognition based on support vector machine using dictionary as features
    Mitsumori, T
    Fation, S
    Murata, M
    Doi, K
    Doi, H
    BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [37] Identification of a novel signature based on unfolded protein response-related gene for predicting prognosis in bladder cancer
    Ke Zhu
    Liu Xiaoqiang
    Wen Deng
    Gongxian Wang
    Bin Fu
    Human Genomics, 15
  • [38] Identification of a novel signature based on unfolded protein response-related gene for predicting prognosis in bladder cancer
    Ke Zhu
    Liu Xiaoqiang
    Wen Deng
    Wang, Gongxian
    Bin Fu
    HUMAN GENOMICS, 2021, 15 (01)
  • [39] Identification of Phase-Separation-Protein-Related Function Based on Gene Ontology by Using Machine Learning Methods
    Ma, Qinglan
    Huang, FeiMing
    Guo, Wei
    Feng, KaiYan
    Huang, Tao
    Cai, Yudong
    LIFE-BASEL, 2023, 13 (06):
  • [40] A Chinese person name recognition system based on agent-based HMM position tagging model
    Guo, Yimo
    Gao, Huanping
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 4069 - +