Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations

被引：30

作者：

Munkhdalai, Tsendsuren ^{[1
]}

Li, Meijing ^{[1
]}

Batsuren, Khuyagbaatar ^{[1
]}

Park, Hyeon Ah ^{[1
]}

Choi, Nak Hyeon ^{[1
]}

Ryu, Keun Ho ^{[1
]}

机构：

[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Database Bioinformat Lab, Cheongju, South Korea

来源：

JOURNAL OF CHEMINFORMATICS | 2015年 / 7卷

基金：

新加坡国家研究基金会;

关键词：

Feature Representation Learning; Semi-Supervised Learning; Named Entity Recognition; Conditional Random Fields;

D O I：

10.1186/1758-2946-7-S1-S9

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Background: Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data. Results: We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% Fmeasure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.

引用

页数：8

共 50 条

[41] Named Entity Recognition From Biomedical Data
Refaat, Maged
Rafea, Ahmed
Gaballah, Nada
2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 838 - 844
[42] Incorporating Entity Type-Aware and Word-Word Relation-Aware Attention in Generative Named Entity Recognition
Mo, Ying
Li, Zhoujun
ELECTRONICS, 2024, 13 (07)
[43] A comparative study for biomedical named entity recognition
Xu Wang
Chen Yang
Renchu Guan
International Journal of Machine Learning and Cybernetics, 2018, 9 : 373 - 382
[44] Measuring the effect of different types of unsupervised word representations on Medical Named Entity Recognition
Casillas, Arantza
Ezeiza, Nerea
Goenaga, Takes
Perez, Alicia
Soto, Xabier
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 129 : 100 - 106
[45] Efficient methods for biomedical named entity recognition
Chan, Shing-Kit
Lam, Wai
PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 729 - 735
[46] A Systematic Review on Biomedical Named Entity Recognition
Kanimozhi, U.
Manjula, D.
DATA SCIENCE ANALYTICS AND APPLICATIONS, DASAA 2017, 2018, 804 : 19 - 37
[47] Feature Importance for Biomedical Named Entity Recognition
Huggard, Hamish
Zhang, Aaron
Zhang, Edmond
Koh, Yun Sing
AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 406 - 417
[48] A comparative study for biomedical named entity recognition
Wang, Xu
Yang, Chen
Guan, Renchu
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (03) : 373 - 382
[49] Biomedical Named Entity Recognition with Less Supervision
Ghiasvand, Omid
Kate, Rohit J.
2015 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2015), 2015, : 495 - 495
[50] Named entity recognition with multiple segment representations
Cho, Han-Cheol
Okazaki, Naoaki
Miwa, Makoto
Tsujii, Jun'ichi
INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (04) : 954 - 965

← 1 2 3 4 5 →