An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition

被引:2
|
作者
Hoxha, Klesti [1 ]
Baxhaku, Artur [1 ]
机构
[1] Univ Tirana, Fac Nat Sci, Tirana 1001, Albania
关键词
Named entity recognition; natural language processing; language corpora; semi-automatic annotation; information extraction;
D O I
10.2478/cait-2018-0009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Named Entity Recognition (NER) is an important task in many NLP pipelines. It has become especially important for knowledge bases that power many of the nowadays information retrieval systems. In order to cope with the high demand for annotated training corpora for supervised NER systems, automatic generation approaches have been proposed. In this paper we report on the first automatically generated NE annotated corpus for Albanian. News articles from Albanian news media were used as a document source. They were automatically tagged using a custom generated gazetteer from the Albanian Wikipedia. Our evaluation results show that this corpus can be used as a baseline corpus for human annotated ones or as a training corpus where no other is available.
引用
收藏
页码:95 / 108
页数:14
相关论文
共 50 条
  • [41] Transfer Learning from Automatically Annotated Data for Recognizing Named Entities in Recent Generated Texts
    Kim, Juae
    Park, Youngmin
    Kang, Sangwoo
    Seo, Jungyun
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2019, : 147 - 151
  • [42] An Automatically Built Named Entity Lexicon for Arabic
    Attia, M.
    Toral, A.
    Tounsi, L.
    Monachini, M.
    Genabith, J. V.
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [43] Automatically Finding Actors in Texts: A Performance Review of Multilingual Named Entity Recognition Tools
    Balluff, Paul
    Boomgaarden, Hajo G.
    Waldherr, Annie
    COMMUNICATION METHODS AND MEASURES, 2024, 18 (04) : 371 - 389
  • [44] A Named Entity Recognition Corpus for Vietnamese Biomedical Texts to Support Tuberculosis Treatment
    Phan, Uyen T. P.
    Nguyen, Phuong N. V.
    Nguyen, Nhung T. H.
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3601 - 3609
  • [45] UlyssesNER-Br: A Corpus of Brazilian Legislative Documents for Named Entity Recognition
    Albuquerque, Hidelberg O.
    Costa, Rosimeire
    Silvestre, Gabriel
    Souza, Ellen
    da Silva, Nadia F. F.
    Vitorio, Douglas
    Moriyama, Gyovana
    Martins, Lucas
    Soezima, Luiza
    Nunes, Augusto
    Siqueira, Felipe
    Tarrega, Joao P.
    Beinotti, Joao, V
    Dias, Marcio
    Silva, Matheus
    Gardini, Miguel
    Silva, Vinicius
    de Carvalho, Andre C. P. L. F.
    Oliveira, Adriano L., I
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 3 - 14
  • [46] Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus
    Suriyachay, Kitiya
    Sornlertlamvanich, Virach
    2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, : 30 - 35
  • [47] DrugSemantics: A corpus for Named Entity. Recognition in Spanish Summaries of Product Characteristics
    Moreno, Isabel
    Boldrini, Ester
    Moreda, Paloma
    Teresa Roma-Ferri, M.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 72 : 8 - 22
  • [48] GraphNER: Using Corpus Level Similarities and Graph Propagation for Named Entity Recognition
    Sheikhshab, Golnar
    Starks, Elizabeth
    Karsan, Aly
    Chiu, Readman
    Sarkar, Anoop
    Birol, Inanc
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 229 - 238
  • [49] Building the ArabNER Corpus for Arabic Named Entity Recognition Using ChatGPT and Bard
    Mahdhaoui, Hassen
    Mars, Abdelkarim
    Zrigui, Mounir
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT I, ACIIDS 2024, 2024, 14795 : 159 - 170
  • [50] Annotated Corpus of Named Entities for Ukrainian Language
    Dmytrash, Olha
    Romanyuk, Andriy
    2013 12TH INTERNATIONAL CONFERENCE ON THE EXPERIENCE OF DESIGNING AND APPLICATION OF CAD SYSTEMS IN MICROELECTRONICS (CADSM 2013), 2013, : 80 - 81