Contemplata, a Free Platform for Constituency Treebank Annotation

被引:0
|
作者
Waszczuk, Jakub [1 ]
Wang, Ilaine [2 ,3 ]
Antoine, Jean-Yves [2 ]
Halftermeyer, Anais [3 ]
机构
[1] Heinrich Heine Univ, Dusseldorf, Germany
[2] Univ Tours, LIFAT, Tours, France
[3] Univ Orleans, LIFO, Orleans, France
关键词
treebank; syntactic annotation; spontaneous speech; French language; constituent trees;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper describes Contemplata, an annotation platform that offers a generic solution for treebank building as well as treebank enrichment with relations between syntactic nodes. Contemplata is dedicated to the annotation of constituency trees. The framework includes support for syntactic parsers, which provide automatic annotations to be manually revised. The balanced strategy of annotation between automatic parsing and manual revision allows to reduce the annotator workload, which favours data reliability. The paper presents the software architecture of Contemplata, describes its practical use and eventually gives two examples of annotation projects that were conducted on the platform.
引用
收藏
页码:7222 / 7229
页数:8
相关论文
共 50 条
  • [31] Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation
    Prasad, Rashmi
    Webber, Bonnie
    Joshi, Aravind
    COMPUTATIONAL LINGUISTICS, 2014, 40 (04) : 921 - 950
  • [32] Towards building a Kashmiri Treebank: Setting up the Annotation Pipeline
    Bhat, Riyaz Ahmad
    Bhat, Shahid Mushtaq
    Sharma, Dipti Misra
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 748 - 752
  • [33] The Index Thomisticus Treebank Project: Annotation, Parsing and Valency Lexicon
    McGillivray, Barbara
    Passarotti, Marco
    Ruffolo, Paolo
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2009, 50 (02): : 103 - 127
  • [34] Partial Parsing as a Method to Expedite Dependency Annotation of a Hindi Treebank
    Gupta, Mridul
    Yadav, Vineet
    Husain, Samar
    Sharma, Dipti Misra
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1930 - 1935
  • [35] CCGweb: a New Annotation Tool and a First Quadrilingual CCG Treebank
    Evang, Kilian
    Abzianidze, Lasha
    Bos, Johan
    13TH LINGUISTIC ANNOTATION WORKSHOP (LAW XIII), 2019, : 37 - 42
  • [36] Adjusting Indonesian Multiword Expression Annotation to the Penn Treebank Format
    Arwidarasti, Jessica Naraiswari
    Alfina, Ika
    Krisnadhi, Adila Alfa
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 75 - 80
  • [37] Diacritic Annotation in the Arabic Treebank and Its Impact on Parser Evaluation
    Maamouri, Mohamed
    Kulick, Seth
    Bies, Ann
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2773 - 2776
  • [38] ICON: A Linguistically-Motivated Large-Scale Benchmark Indonesian Constituency Treebank
    Lim, Ee Suan
    Leong, Wei Qi
    Nguyen, Thanh Ngan
    Kng, Wei Ming
    Tjhi, William Chandra
    Adhista, Dea
    Purwarianti, Ayu
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (08)
  • [39] The annotation guidelines of the Latin Dependency Treebank and Index Thomisticus Treebank The treatment of some specific syntactic constructions in Latin
    Bamman, David
    Passarotti, Marco
    Busa, Roberto
    Crane, Gregory
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 71 - 76
  • [40] Syntactic Annotation in the I3rab Dependency Treebank
    Halabi, Dana
    Awajan, Arafat
    Fayyoumi, Ebaa
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (3A) : 381 - 392