Annotation of specialized corpora using a comprehensive entity and relation scheme

被引:0
|
作者
Deleger, Louise [1 ]
Ligozat, Anne-Laure [1 ,2 ]
Grouin, Cyril [1 ]
Zweigenbaum, Pierre [1 ]
Neveol, Aurelie [1 ]
机构
[1] CNRS, UPR 3251, LIMSI, F-91403 Orsay, France
[2] ENSIIE, F-91000 Evry, France
关键词
Annotation; Clinical Texts; Natural Language Processing; INFORMATION; CORPUS;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Annotated corpora are essential resources for many applications in Natural Language Processing. They provide insight on the linguistic and semantic characteristics of the genre and domain covered, and can be used for the training and evaluation of automatic tools. In the biomedical domain, annotated corpora of English texts have become available for several genres and subfields. However, very few similar resources are available for languages other than English. In this paper we present an effort to produce a high-quality corpus of clinical documents in French, annotated with a comprehensive scheme of entities and relations. We present the annotation scheme as well as the results of a pilot annotation study covering 35 clinical documents in a variety of subfields and genres. We show that high inter-annotator agreement can be achieved using a complex annotation scheme.
引用
收藏
页码:1267 / 1274
页数:8
相关论文
共 50 条
  • [31] A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation
    Deriu, Jan
    Mlynchyk, Katsiaryna
    Schlaepfer, Philippe
    Rodrigo, Alvaro
    von Gruenigen, Dirk
    Kaiser, Nicolas
    Stockinger, Kurt
    Agirre, Eneko
    Cieliebak, Mark
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 897 - 911
  • [32] Automated identification of hyponymy in French specialized corpora using Sketch Engine
    San Martin, Antonio
    Trekker, Catherine
    Leon-Arauz, Pilar
    TERMINOLOGY, 2022, 28 (02): : 264 - 298
  • [33] The impact of using different annotation schemes on named entity recognition
    Alshammari, Nasser
    Alanazi, Saad
    EGYPTIAN INFORMATICS JOURNAL, 2021, 22 (03) : 295 - 302
  • [34] A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News
    Jabbari, Ali
    Sauvage, Olivier
    Zeine, Hamada
    Chergui, Hamza
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2293 - 2299
  • [35] A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products
    Schoen, Saskia
    Mironova, Veselina
    Gabryszak, Aleksandra
    Hennig, Leonhard
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4445 - 4451
  • [36] Automatic Annotation of Speech Corpora using Complementary GMM and DNN Acoustic Models
    Georgescu, Alexandru-Lucian
    Cucu, Horia
    2018 41ST INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2018, : 794 - 797
  • [37] Using Continuous Integration to Organize and Monitor the Annotation Process of Domain Specific Corpora
    Schreiber, Marc
    Barkschat, Kai
    Kraft, Bodo
    2014 5TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2014,
  • [38] Thai Named Entity Corpus Annotation Scheme and Self Verification by BiLSTM-CNN-CRF
    Sornlertlamvanich, Virach
    Suriyachay, Kitiya
    Charoenporn, Thatsanee
    HUMAN LANGUAGE TECHNOLOGY: CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, LTC 2019, 2022, 13212 : 143 - 160
  • [39] Evaluating corpora for named entity recognition using character-level features
    Whitelaw, C
    Patrick, J
    AI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2003, 2903 : 910 - 921
  • [40] Student Research Abstract: Dual Architecture for Name Entity Extraction and Relation Extraction with Applications in Medical Corpora
    Caballero, Ernesto Quevedo
    37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 883 - 886