A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies

被引:0
|
作者
Ngoc-Trinh Vu [1 ,2 ]
Van-Hien Tran [1 ]
Thi-Huyen-Trang Doan [1 ]
Hoang-Quynh Le [1 ]
Mai-Vu Tran [1 ]
机构
[1] Vietnam Natl Univ Hanoi, Univ Engn & Technol, Knowledge Technol Lab, Hanoi, Vietnam
[2] Vietnam Natl Oil & Gas Grp, Vietnam Petr Inst, Hanoi, Vietnam
关键词
Named entity recognition; Phenotype; Machine learning; Biomedical ontology;
D O I
10.1007/978-3-319-17996-4_13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Building a labeled corpus which contains sufficient data and good coverage along with solving the problems of cost, effort and time is a popular research topic in natural language processing. The problem of constructing automatic or semi-automatic training data has become a matter of the research community. For this reason, we consider the problem of building a corpus in phenotype entity recognition problem, classs-specific feature detectors from unlabeled data based on over 10260 unique terms (more than 15000 synonyms) describing human phenotypic features in the Human Phenotype Ontology (HPO) and about 9000 unique terms (about 24000 synonyms) of mouse abnormal phenotype descriptions in the Mammalian Phenotype Ontology. This corpus evaluated on three corpora: Khordad corpus, Phenominer 2012 and Phenominer 2013 corpora with Maximum Entropy and Beam Search method. The performance is good for three corpora, with F-scores of 31.71% and 35.77% for Phenominer 2012 corpus and Phenominer 2013 corpus; 78.36% for Khordad corpus.
引用
收藏
页码:141 / 149
页数:9
相关论文
共 50 条
  • [31] Nested Named Entity Recognition as Building Local Hypergraphs
    Yan, Yukun
    Cai, Bingling
    Song, Sen
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13878 - 13886
  • [32] Enriching Ontologies for Named Entity Disambiguation
    Hien Thanh Nguyen
    Tru Hoang Cao
    SEMAPRO 2010: THE FOURTH INTERNATIONAL CONFERENCE ON ADVANCES IN SEMANTIC PROCESSING, 2010, : 37 - 42
  • [33] A named entity recognition method of english product
    Gu, Chuan (42448778@qq.com), 1600, Universidad Central de Venezuela (55):
  • [34] A Hybrid Method for Persian Named Entity Recognition
    Ahmadi, Farid
    Moradi, Hamed
    2015 7th Conference on Information and Knowledge Technology (IKT), 2015,
  • [35] Building a Named Entity Annotated Bilingual English-Vietnamese Corpus
    Tuan-An Dao
    Hung-Thinh Truong
    Long Nguyen
    Dien Dinh
    PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2018, : 61 - 66
  • [36] Building a Pediatric Medical Corpus: Word Segmentation and Named Entity Annotation
    Zan Hongying
    Li Wenxin
    Zhang Kunli
    Ye Yajuan
    Chang Baobao
    Sui Zhifang
    CHINESE LEXICAL SEMANTICS (CLSW 2020), 2021, 12278 : 652 - 664
  • [37] Quantitative Analysis of Art Market Using Ontologies, Named Entity Recognition and Machine Learning: A Case Study
    Filipiak, Dominik
    Agt-Rickauer, Henning
    Hentschel, Christian
    Filipowska, Agata
    Sack, Harald
    BUSINESS INFORMATION SYSTEMS (BIS 2016), 2016, 255 : 79 - 90
  • [38] Emerging Named Entity Recognition on Retrieval Features in an Affective Computing Corpus
    Nawroth, Christian
    Engel, Felix
    Mc Kevitt, Paul
    Hemmje, Matthias L.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2860 - 2868
  • [39] LegalNERo: A linked corpus for named entity recognition in the Romanian legal domain
    Pais, Vasile
    Mitrofan, Maria
    Gasan, Carol Luca
    Ianov, Alexandru
    Ghita, Corvin
    Coneschi, Vlad Silviu
    Onut, Andrei
    SEMANTIC WEB, 2024, 15 (03) : 831 - 844
  • [40] A web-based Bengali news corpus for named entity recognition
    Asif Ekbal
    Sivaji Bandyopadhyay
    Language Resources and Evaluation, 2008, 42 : 173 - 182