Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification

被引:23
|
作者
Schuemie, Martijn J. [1 ]
Mons, Barend [1 ]
Weeber, Marc [1 ]
Kors, Jan A. [1 ]
机构
[1] Erasmus Univ, Med Ctr, Dept Med Informat, NL-3000 DR Rotterdam, Netherlands
关键词
gene name identification; information extraction; dictionary; thesaurus; spelling variations;
D O I
10.1016/j.jbi.2006.09.002
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision. (C) 2006 Elsevier Inc. All rights reserved.
引用
收藏
页码:316 / 324
页数:9
相关论文
共 38 条
  • [31] Comparative evaluation of Polymerase Chain Reaction-Restriction Enzyme Analysis (PRA) and sequencing of heat shock protein 65 (hsp65) gene for identification of aquatic mycobacteria
    Pourahmad, F.
    Thompson, K. D.
    Adams, A.
    Richards, R. H.
    JOURNAL OF MICROBIOLOGICAL METHODS, 2009, 76 (02) : 128 - 135
  • [32] Identification of DNA-protein interactions in the 5′ flanking and 5′ untranslated regions of the human multidrug resistance protein (MRP1) gene:: Evaluation of a putative antioxidant response element/AP-1 binding site
    Kurz, EU
    Cole, SPC
    Deeley, RG
    BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2001, 285 (04) : 981 - 990
  • [33] Identification and evaluation of multi-antigenic epitopes of immunodominant protein from the selected Crimean–Congo hemorrhagic fever virus genome towards the development of diagnostic and vaccine candidates by reverse vaccinology approach
    Akshay Jeyachandran
    Reshma Muthuvel
    Selvaraj Jagannathan
    Sarika Baburajan Pillai
    Vijayakumar Rajendran
    Rahul Gandhi Pachamuthu
    Ajithkumar Balakrishnan
    Hemamalini Vedagiri
    Shivanandappa Kukkaler Channappa
    Ananda Arona Premkumar
    Sivakumar Sakthivel
    Abhishek Mandal
    Journal of Proteins and Proteomics, 2024, 15 (4) : 625 - 634
  • [34] In-depth Identification of Pathways Related to Cisplatin-induced Hepatotoxicity through an Integrative Method Based on an Informatics-assisted Label-free Protein Quantitation and Microarray Gene Expression Approach
    Cho, Young-Eun
    Singh, Thoudam S. K.
    Lee, Hyun-Chul
    Moon, Pyong-Gon
    Lee, Jeong-Eun
    Lee, Myung-Hoon
    Choi, Eung-Chil
    Chen, Yu-Ju
    Kim, Sang-Hyun
    Baek, Moon-Chang
    MOLECULAR & CELLULAR PROTEOMICS, 2012, 11 (01)
  • [35] Promoter sequence, expression, and fine chromosomal mapping of the human gene (MLP) encoding the MARCKS-like protein:: Identification of neighboring and linked polymorphic loci for MLP and MACS and use in the evaluation of human neural tube defects
    Stumpo, DJ
    Eddy, RL
    Haley, LL
    Sait, S
    Shows, TB
    Lai, WS
    Young, WS
    Speer, MC
    Dehejia, A
    Polymeropoulos, M
    Blackshear, PJ
    GENOMICS, 1998, 49 (02) : 253 - 264
  • [38] Identification of gene and protein signatures associated with long-term effects of COVID-19 on the immune system after patient recovery by analyzing single-cell multi-omics data using a machine learning approach
    Ren, JingXin
    Gao, Qian
    Zhou, XianChao
    Chen, Lei
    Guo, Wei
    Feng, KaiYan
    Hu, Jerry
    Huang, Tao
    Cai, Yu-Dong
    VACCINE, 2024, 42 (23)