A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies

被引:0
|
作者
Ngoc-Trinh Vu [1 ,2 ]
Van-Hien Tran [1 ]
Thi-Huyen-Trang Doan [1 ]
Hoang-Quynh Le [1 ]
Mai-Vu Tran [1 ]
机构
[1] Vietnam Natl Univ Hanoi, Univ Engn & Technol, Knowledge Technol Lab, Hanoi, Vietnam
[2] Vietnam Natl Oil & Gas Grp, Vietnam Petr Inst, Hanoi, Vietnam
关键词
Named entity recognition; Phenotype; Machine learning; Biomedical ontology;
D O I
10.1007/978-3-319-17996-4_13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Building a labeled corpus which contains sufficient data and good coverage along with solving the problems of cost, effort and time is a popular research topic in natural language processing. The problem of constructing automatic or semi-automatic training data has become a matter of the research community. For this reason, we consider the problem of building a corpus in phenotype entity recognition problem, classs-specific feature detectors from unlabeled data based on over 10260 unique terms (more than 15000 synonyms) describing human phenotypic features in the Human Phenotype Ontology (HPO) and about 9000 unique terms (about 24000 synonyms) of mouse abnormal phenotype descriptions in the Mammalian Phenotype Ontology. This corpus evaluated on three corpora: Khordad corpus, Phenominer 2012 and Phenominer 2013 corpora with Maximum Entropy and Beam Search method. The performance is good for three corpora, with F-scores of 31.71% and 35.77% for Phenominer 2012 corpus and Phenominer 2013 corpus; 78.36% for Khordad corpus.
引用
收藏
页码:141 / 149
页数:9
相关论文
共 50 条
  • [21] Assessment of disease named entity recognition on a corpus of annotated sentences
    Jimeno, Antonio
    Jimenez-Ruiz, Ernesto
    Lee, Vivian
    Gaudan, Sylvain
    Berlanga, Rafael
    Rebholz-Schuhmann, Dietrich
    BMC BIOINFORMATICS, 2008, 9 (Suppl 3)
  • [22] Assessment of disease named entity recognition on a corpus of annotated sentences
    Antonio Jimeno
    Ernesto Jimenez-Ruiz
    Vivian Lee
    Sylvain Gaudan
    Rafael Berlanga
    Dietrich Rebholz-Schuhmann
    BMC Bioinformatics, 9
  • [23] GENETAG: a tagged corpus for gene/protein named entity recognition
    Lorraine Tanabe
    Natalie Xie
    Lynne H Thom
    Wayne Matten
    W John Wilbur
    BMC Bioinformatics, 6
  • [24] Named entity recognition through corpus transformation and system combination
    Troyano, JA
    Carrillo, V
    Enríquez, F
    Galán, FJ
    ADVANCES IN NATURAL LANGUAGE PROCESSING, 2004, 3230 : 255 - 266
  • [25] An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition
    Hoxha, Klesti
    Baxhaku, Artur
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2018, 18 (01) : 95 - 108
  • [26] GENETAG: a tagged corpus for gene/protein named entity recognition
    Tanabe, L
    Xie, N
    Thom, LH
    Matten, W
    Wilbur, WJ
    BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [27] Corpus of Carbonate Platforms with Lexical Annotations for Named Entity Recognition
    Hu, Zhichen
    Ren, Huali
    Jiang, Jielin
    Cui, Yan
    Hu, Xiumian
    Xu, Xiaolong
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 135 (01): : 91 - 108
  • [28] CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes
    Nastou, Katerina
    Koutrouli, Mikaela
    Pyysalo, Sampo
    Jensen, Lars Juhl
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [29] Named Entity Recognition for Code-Mixed Indian Corpus using Meta Embedding
    Priyadharshini, Ruba
    Chakravarthi, Bharathi Raja
    Vegupatti, Mani
    McCrae, John P.
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 68 - 72
  • [30] Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data
    Jiang, Haoming
    Zhang, Danqing
    Cao, Tianyu
    Yin, Bing
    Zhao, Tuo
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1775 - 1789