A dictionary-based approach to normalizing gene names in one domain of knowledge from the biomedical literature

被引:9
|
作者
Galvez, Carmen [1 ]
de Moya-Anegon, Felix [2 ]
机构
[1] Univ Granada, Dept Informat Sci, Commun & Documentat Fac, Granada, Spain
[2] Inst Publ Goods & Policies IPP, SCImago Res Grp CSIC, Madrid, Spain
关键词
Linguistics; Dictionary; Gene name normalization; Genes; LITERATURE-BASED DISCOVERY; MOLECULAR-BIOLOGY; MEDICAL LITERATURES; PROTEIN NAMES; FISH OIL; TEXT; INFORMATION; NOMENCLATURE; GUIDELINES; ONTOLOGY;
D O I
10.1108/00220411211200301
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - Gene term variation is a shortcoming in text-mining applications based on biomedical literature-based knowledge discovery. The purpose of this paper is to propose a technique for normalizing gene names in biomedical literature. Design/methodology/approach - Under this proposal, the normalized forms can be characterized as a unique gene symbol, defined as the official symbol or normalized name. The unification method involves five stages: collection of the gene term, using the resources provided by the Entrez Gene database; encoding of gene-naming terms in a table or binary matrix; design of a parametrized finite-state graph (P-FSG); automatic generation of a dictionary; and matching based on dictionary look-up to transform the gene mentions into the corresponding unified form. Findings - The findings show that the approach yields a high percentage of recall. Precision is only moderately high, basically due to ambiguity problems between gene-naming terms and words and abbreviations in general English. Research limitations/implications - The major limitation of this study is that biomedical abstracts were analyzed instead of full-text documents. The number of under-normalization and over-normalization errors is reduced considerably by limiting the realm of application to biomedical abstracts in a well-defined domain. Practical implications - The system can be used for practical tasks in biomedical literature mining. Normalized gene terms can be used as input to literature-based gene clustering algorithms, for identifying hidden gene-to-disease, gene-to-gene and gene-to-literature relationships. Originality/value - Few systems for gene term variation handling have been developed to date. The technique described performs gene name normalization by dictionary look-up.
引用
收藏
页码:5 / 30
页数:26
相关论文
共 47 条
  • [31] Extracting method knowledge elements from scientific literature: A rule-based approach
    Wang Z.
    Shen X.
    Huang R.
    Huang J.
    Proceedings of the Association for Information Science and Technology, 2019, 56 (01) : 805 - 807
  • [32] A Single Kernel-Based Approach to Extract Drug-Drug Interactions from Biomedical Literature
    Zhang, Yijia
    Lin, Hongfei
    Yang, Zhihao
    Wang, Jian
    Li, Yanpeng
    PLOS ONE, 2012, 7 (11):
  • [33] Domain knowledge discovery from abstracts of scientific literature on Nickel-based single crystal superalloys
    Liu, Yue
    Ding, Lin
    Yang, ZhengWei
    Ge, XianYuan
    Liu, DaHui
    Liu, Wei
    Yu, Tao
    Avdeev, Maxim
    Shi, SiQi
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2023, 66 (06) : 1815 - 1830
  • [34] Domain knowledge discovery from abstracts of scientific literature on Nickel-based single crystal superalloys
    Yue Liu
    Lin Ding
    ZhengWei Yang
    XianYuan Ge
    DaHui Liu
    Wei Liu
    Tao Yu
    Maxim Avdeev
    SiQi Shi
    Science China Technological Sciences, 2023, 66 : 1815 - 1830
  • [35] Domain knowledge discovery from abstracts of scientific literature on Nickel-based single crystal superalloys
    LIU Yue
    DING Lin
    YANG ZhengWei
    GE XianYuan
    LIU DaHui
    LIU Wei
    YU Tao
    AVDEEV Maxim
    SHI SiQi
    Science China(Technological Sciences), 2023, (06) : 1815 - 1830
  • [36] Domain knowledge discovery from abstracts of scientific literature on Nickel-based single crystal superalloys
    LIU Yue
    DING Lin
    YANG ZhengWei
    GE XianYuan
    LIU DaHui
    LIU Wei
    YU Tao
    AVDEEV Maxim
    SHI SiQi
    Science China(Technological Sciences), 2023, 66 (06) : 1815 - 1830
  • [37] Automatic extraction of transcriptional regulatory interactions of bacteria from biomedical literature using a BERT-based approach
    Varela-Vega, Alfredo
    Posada-Reyes, Ali-Berenice
    Mendez-Cruz, Carlos-Francisco
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2024, 2024
  • [38] Extracting Drug-Drug Interaction from the Biomedical Literature Using a Stacked Generalization-Based Approach
    He, Linna
    Yang, Zhihao
    Zhao, Zhehuan
    Lin, Hongfei
    Li, Yanpeng
    PLOS ONE, 2013, 8 (06):
  • [39] A semantic Similarity-Based approach to extract respiratory disease-symptom relations from biomedical literature
    Celikten, Azer
    Bulut, Hasan
    Onan, Aytug
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2024, 40 (01): : 121 - 134
  • [40] Case-Based Translation: First Steps from a Knowledge-Light Approach Based on Analogy to a Knowledge-Intensive One
    Lepage, Yves
    Lieber, Jean
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, ICCBR 2018, 2018, 11156 : 563 - 579