Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations

被引:30
|
作者
Munkhdalai, Tsendsuren [1 ]
Li, Meijing [1 ]
Batsuren, Khuyagbaatar [1 ]
Park, Hyeon Ah [1 ]
Choi, Nak Hyeon [1 ]
Ryu, Keun Ho [1 ]
机构
[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Database Bioinformat Lab, Cheongju, South Korea
来源
JOURNAL OF CHEMINFORMATICS | 2015年 / 7卷
基金
新加坡国家研究基金会;
关键词
Feature Representation Learning; Semi-Supervised Learning; Named Entity Recognition; Conditional Random Fields;
D O I
10.1186/1758-2946-7-S1-S9
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data. Results: We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% Fmeasure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Named Entity Recognition From Biomedical Data
    Refaat, Maged
    Rafea, Ahmed
    Gaballah, Nada
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 838 - 844
  • [42] Incorporating Entity Type-Aware and Word-Word Relation-Aware Attention in Generative Named Entity Recognition
    Mo, Ying
    Li, Zhoujun
    ELECTRONICS, 2024, 13 (07)
  • [43] A comparative study for biomedical named entity recognition
    Xu Wang
    Chen Yang
    Renchu Guan
    International Journal of Machine Learning and Cybernetics, 2018, 9 : 373 - 382
  • [44] Measuring the effect of different types of unsupervised word representations on Medical Named Entity Recognition
    Casillas, Arantza
    Ezeiza, Nerea
    Goenaga, Takes
    Perez, Alicia
    Soto, Xabier
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 129 : 100 - 106
  • [45] Efficient methods for biomedical named entity recognition
    Chan, Shing-Kit
    Lam, Wai
    PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 729 - 735
  • [46] A Systematic Review on Biomedical Named Entity Recognition
    Kanimozhi, U.
    Manjula, D.
    DATA SCIENCE ANALYTICS AND APPLICATIONS, DASAA 2017, 2018, 804 : 19 - 37
  • [47] Feature Importance for Biomedical Named Entity Recognition
    Huggard, Hamish
    Zhang, Aaron
    Zhang, Edmond
    Koh, Yun Sing
    AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 406 - 417
  • [48] A comparative study for biomedical named entity recognition
    Wang, Xu
    Yang, Chen
    Guan, Renchu
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (03) : 373 - 382
  • [49] Biomedical Named Entity Recognition with Less Supervision
    Ghiasvand, Omid
    Kate, Rohit J.
    2015 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2015), 2015, : 495 - 495
  • [50] Named entity recognition with multiple segment representations
    Cho, Han-Cheol
    Okazaki, Naoaki
    Miwa, Makoto
    Tsujii, Jun'ichi
    INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (04) : 954 - 965