Identification of related gene/protein names based on an HMM of name variations

被引:15
|
作者
Yeganova, L [1 ]
Smith, L [1 ]
Wilbur, WJ [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, Computat Biol Branch, NIH, Bethesda, MD 20894 USA
关键词
automatic term recognition; gene name variation; hidden Markov model; information extraction;
D O I
10.1016/j.compbiolchem.2003.12.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR), We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model. (C) 2003 Published by Elsevier Ltd.
引用
收藏
页码:97 / 107
页数:11
相关论文
共 50 条
  • [41] Identification of Protein-Protein Interaction Associated Functions Based on Gene Ontology
    Zhang, Yu-Hang
    Huang, FeiMing
    Li, JiaBo
    Shen, WenFeng
    Chen, Lei
    Feng, KaiYan
    Huang, Tao
    Cai, Yu-Dong
    PROTEIN JOURNAL, 2024, 43 (03): : 477 - 486
  • [42] Identification of candidate genes related to pancreatic cancer based on analysis of gene co-expression and protein-protein interaction network
    Zhang, Tiejun
    Wang, Xiaojuan
    Yue, Zhenyu
    ONCOTARGET, 2017, 8 (41) : 71105 - 71116
  • [43] A writer identification and verification system using HMM based recognizers
    Schlapbach, Andreas
    Bunke, Horst
    PATTERN ANALYSIS AND APPLICATIONS, 2007, 10 (01) : 33 - 43
  • [44] An Identification Method for Road Hypnosis Based on XGBoost-HMM
    Chen, Longfei
    Jiao, Chenyang
    Wang, Bin
    Wang, Xiaoyuan
    Wang, Jingheng
    Zhang, Han
    Han, Junyan
    Shen, Cheng
    Feng, Kai
    Wang, Quanzheng
    Liu, Yi
    SENSORS, 2025, 25 (06)
  • [45] An HMM-based subband processing approach to speaker identification
    Higgins, JE
    Damper, RI
    AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2001, 2091 : 169 - 174
  • [46] IDENTIFICATION OF A GENE ENCODING AN ACTIN-RELATED PROTEIN IN DROSOPHILA-MELANOGASTER
    FRANKEL, S
    HEINTZELMAN, MB
    ARTAVANISTSAKONAS, S
    MOOSEKER, MS
    MOLECULAR BIOLOGY OF THE CELL, 1992, 3 : A37 - A37
  • [47] Name-based measures of neighborhood composition: how telling are neighbors' names?
    Kruse, Hanno
    Dollmann, Joerg
    SURVEY RESEARCH METHODS, 2017, 11 (04): : 435 - 450
  • [48] FAMILY-GROUP NAMES BASED ON NAME OF GENUS ELMIS LATREILLE (COLEOPTERA)
    STEYSKAL, GC
    PROCEEDINGS OF THE ENTOMOLOGICAL SOCIETY OF WASHINGTON, 1975, 77 (01) : 59 - 60
  • [49] Missing names from the Fungal Name Repositories found in tie literature related to Chinese fungi
    Wang, Ke
    Wang, Yong-Hui
    Zhao, Ming-Jun
    Kirk, Paul M.
    Yao, Yi-Jian
    PHYTOTAXA, 2019, 411 (01) : 1 - 22
  • [50] The neglected name Statice auriculifolia (Plumbaginaceae) and its related names: A long history of nomenclatural intricacy
    Del Guacchio, Emanuele
    Erben, Matthias
    Caputo, Paolo
    TAXON, 2019, 68 (05) : 1093 - 1100