A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers

被引:24
|
作者
Hamdi, Ahmed [1 ]
Pontes, Elvys Linhares [1 ]
Boros, Emanuela [1 ]
Thi Tuyet Hai Nguyen [1 ]
Hackl, Guenter [2 ]
Moreno, Jose G. [3 ]
Doucet, Antoine [1 ]
机构
[1] Univ La Rochelle, L3i, La Rochelle, France
[2] Innsbruck Univ Innovat GmbH, Innsbruck, Austria
[3] Univ Toulouse, IRIT, Toulouse, France
关键词
datasets; multilingual; diachronic historical newspapers; named entity recognition; entity linking; stance detection;
D O I
10.1145/3404835.3463255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Named entity processing over historical texts is more and more being used due to the massive documents and archives being stored in digital libraries. However, due to the poor annotated resources of historical nature, information extraction performances fall behind those on contemporary texts. In this paper, we introduce the development of the NewsEye resource, a multilingual dataset for named entity recognition and linking enriched with stances towards named entities. The dataset is comprised of diachronic historical newspaper material published between 1850 and 1950 in French, German, Finnish, and Swedish. Such historical resource is essential in the context of developing and evaluating named entity processing systems. It evenly allows enhancing the performances of existing approaches on historical documents which enables adequate and efficient semantic indexing of historical documents on digital cultural heritage collections.
引用
收藏
页码:2328 / 2334
页数:7
相关论文
共 50 条
  • [1] Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents
    Ehrmann, Maud
    Romanello, Matteo
    Najem-Meyer, Sven
    Doucet, Antoine
    Clematide, Simon
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2022), 2022, 13390 : 423 - 446
  • [2] Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
    Ehrmann, Maud
    Romanello, Matteo
    Doucet, Antoine
    Clematide, Simon
    ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 347 - 354
  • [3] Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language
    Khairunnisa, Siti Oryza
    Chen, Zhousi
    Komachi, Mamoru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [4] Multilingual Transformers for Named Entity Recognition
    Viksna, Rinalds
    Skadin, Inguna
    BALTIC JOURNAL OF MODERN COMPUTING, 2022, 10 (03): : 457 - 469
  • [5] Joint Learning of Named Entity Recognition and Entity Linking
    Martins, Pedro Henrique
    Marinho, Zita
    Martins, Andre F. T.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 190 - 196
  • [6] A Named Entity Recognition Dataset for Turkish
    Kucuk, Dilek
    Kucuk, Dogan
    Arici, Nursal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 329 - 332
  • [7] VoxEL: A Benchmark Dataset for Multilingual Entity Linking
    Rosales-Mendez, Henry
    Hogan, Aidan
    Poblete, Barbara
    SEMANTIC WEB - ISWC 2018, PT II, 2018, 11137 : 170 - 186
  • [8] Language Clustering for Multilingual Named Entity Recognition
    Shaffer, Kyle
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 40 - 45
  • [9] KazNERD: Kazakh Named Entity Recognition Dataset
    Yeshpanov, Rustem
    Khassanov, Yerbolat
    Varol, Huseyin Atakan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 417 - 426
  • [10] DroNER: Dataset for drone named entity recognition
    Silalahi, Swardiantara
    Ahmad, Tohari
    Studiawan, Hudan
    DATA IN BRIEF, 2023, 48