A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies

被引:0
|
作者
Ngoc-Trinh Vu [1 ,2 ]
Van-Hien Tran [1 ]
Thi-Huyen-Trang Doan [1 ]
Hoang-Quynh Le [1 ]
Mai-Vu Tran [1 ]
机构
[1] Vietnam Natl Univ Hanoi, Univ Engn & Technol, Knowledge Technol Lab, Hanoi, Vietnam
[2] Vietnam Natl Oil & Gas Grp, Vietnam Petr Inst, Hanoi, Vietnam
关键词
Named entity recognition; Phenotype; Machine learning; Biomedical ontology;
D O I
10.1007/978-3-319-17996-4_13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Building a labeled corpus which contains sufficient data and good coverage along with solving the problems of cost, effort and time is a popular research topic in natural language processing. The problem of constructing automatic or semi-automatic training data has become a matter of the research community. For this reason, we consider the problem of building a corpus in phenotype entity recognition problem, classs-specific feature detectors from unlabeled data based on over 10260 unique terms (more than 15000 synonyms) describing human phenotypic features in the Human Phenotype Ontology (HPO) and about 9000 unique terms (about 24000 synonyms) of mouse abnormal phenotype descriptions in the Mammalian Phenotype Ontology. This corpus evaluated on three corpora: Khordad corpus, Phenominer 2012 and Phenominer 2013 corpora with Maximum Entropy and Beam Search method. The performance is good for three corpora, with F-scores of 31.71% and 35.77% for Phenominer 2012 corpus and Phenominer 2013 corpus; 78.36% for Khordad corpus.
引用
收藏
页码:141 / 149
页数:9
相关论文
共 50 条
  • [11] Using corpus-derived name lists for named entity recognition
    Stevenson, M
    Gaizauskas, R
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 290 - 295
  • [12] Arabic Named Entity Recognition Using Boosting Method
    Sajadi, Mohamad Bagher
    Minaei, Behrooz
    2017 19TH CSI INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2017, : 281 - 288
  • [13] Construction of a Geological Fault Corpus and Named Entity Recognition
    Wang, Huainuo
    Niu, Ruiqing
    Han, Yongyao
    Deng, Qinglu
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [14] An Open Corpus for Named Entity Recognition in Historic Newspapers
    Neudecker, Clemens
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4348 - 4352
  • [15] MTNER: A Corpus for Mongolian Tourism Named Entity Recognition
    Cheng, Xiao
    Wang, Weihua
    Bao, Feilong
    Gao, Guanglai
    MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 11 - 23
  • [16] GraphNER: Using Corpus Level Similarities and Graph Propagation for Named Entity Recognition
    Sheikhshab, Golnar
    Starks, Elizabeth
    Karsan, Aly
    Chiu, Readman
    Sarkar, Anoop
    Birol, Inanc
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 229 - 238
  • [17] Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis
    Jiang, Hang
    Hua, Yining
    Beeferman, Doug
    Roy, Deb
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7199 - 7208
  • [18] A Method of Named Entity Recognition for Tigrinya
    Yohannes, Hailemariam Mehari
    Amagasa, Toshiyuki
    APPLIED COMPUTING REVIEW, 2022, 22 (03): : 56 - 68
  • [19] A Simple but Useful Multi-corpus Transferring Method for Biomedical Named Entity Recognition
    Li, Jiqiao
    Yuan, Chi
    Li, Zirui
    Wang, Huaiyu
    Tao, Feifei
    HEALTH INFORMATION PROCESSING, CHIP 2023, 2023, 1993 : 66 - 81
  • [20] A Broad-coverage Corpus for Finnish Named Entity Recognition
    Luoma, Jouni
    Oinonen, Miika
    Pyykonen, Maria
    Laippala, Veronika
    Pyysalo, Sampo
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4615 - 4624