A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers

被引:24
|
作者
Hamdi, Ahmed [1 ]
Pontes, Elvys Linhares [1 ]
Boros, Emanuela [1 ]
Thi Tuyet Hai Nguyen [1 ]
Hackl, Guenter [2 ]
Moreno, Jose G. [3 ]
Doucet, Antoine [1 ]
机构
[1] Univ La Rochelle, L3i, La Rochelle, France
[2] Innsbruck Univ Innovat GmbH, Innsbruck, Austria
[3] Univ Toulouse, IRIT, Toulouse, France
关键词
datasets; multilingual; diachronic historical newspapers; named entity recognition; entity linking; stance detection;
D O I
10.1145/3404835.3463255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Named entity processing over historical texts is more and more being used due to the massive documents and archives being stored in digital libraries. However, due to the poor annotated resources of historical nature, information extraction performances fall behind those on contemporary texts. In this paper, we introduce the development of the NewsEye resource, a multilingual dataset for named entity recognition and linking enriched with stances towards named entities. The dataset is comprised of diachronic historical newspaper material published between 1850 and 1950 in French, German, Finnish, and Swedish. Such historical resource is essential in the context of developing and evaluating named entity processing systems. It evenly allows enhancing the performances of existing approaches on historical documents which enables adequate and efficient semantic indexing of historical documents on digital cultural heritage collections.
引用
收藏
页码:2328 / 2334
页数:7
相关论文
共 50 条
  • [31] On the Strength of Character Language Models for Multilingual Named Entity Recognition
    Yu, Xiaodong
    Mayhew, Stephen
    Sammons, Mark
    Roth, Dan
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3073 - 3077
  • [32] Enhancing Entity Boundary Detection for Better Chinese Named Entity Recognition
    Chen, Chun
    Kong, Fang
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 20 - 25
  • [33] Named Entity Recognition an Aid to Improve Multilingual Entity Filling In Language-Independent Approach
    Bhagavatula, Mahathi
    Santosh, G. S. K.
    Varma, Vasudeva
    PROCEEDINGS OF THE FIRST WORKSHOP ON INFORMATION AND KNOWLEDGE MANAGEMENT FOR DEVELOPING REGION, 2012, : 3 - 9
  • [34] Named Entity Recognition and Classification in Historical Documents: A Survey
    Ehrmann, Maud
    Hamdi, Ahmed
    Pontes, Elvys Linhares
    Romanello, Matteo
    Doucet, Antoine
    ACM COMPUTING SURVEYS, 2024, 56 (02)
  • [35] HistNERo: Historical Named Entity Recognition for the Romanian Language
    Avram, Andrei-Marius
    Iuga, Andreea
    Manolache, George-Vlad
    Matei, Vlad-Cristian
    Miclius, Razvan-Gabriel
    Muntean, Vlad-Andrei
    Sorlescu, Manuel-Petru
    Serban, Dragon-Andrei
    Urse, Adrian-Dinu
    Pais, Vasile
    Cerce, Dumitru-Clementin
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT III, 2024, 14806 : 126 - 144
  • [36] Nested named entity recognition in historical archive text
    Byrne, Kate
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 589 - 596
  • [37] Multilingual Autoregressive Entity Linking
    De Cao, Nicola
    Wu, Ledell
    Popat, Kashyap
    Artetxe, Mikel
    Goyal, Naman
    Plekhanov, Mikhail
    Zettlemoyer, Luke
    Cancedda, Nicola
    Riedel, Sebastian
    Petroni, Fabio
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 274 - 290
  • [38] DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect
    Moussa, Hanane Nour
    Mourhir, Asmaa
    DATA IN BRIEF, 2023, 48
  • [39] AsNER - Annotated Dataset and Baseline for Assamese Named Entity recognition
    Pathak, Dhrubajyoti
    Nandi, Sukumar
    Sarmah, Priyankoo
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6571 - 6577
  • [40] EduNER: a Chinese named entity recognition dataset for education research
    Xu Li
    Chengkun Wei
    Zhuoren Jiang
    Wenlong Meng
    Fan Ouyang
    Zihui Zhang
    Wenzhi Chen
    Neural Computing and Applications, 2023, 35 : 17717 - 17731