A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers

被引:24
|
作者
Hamdi, Ahmed [1 ]
Pontes, Elvys Linhares [1 ]
Boros, Emanuela [1 ]
Thi Tuyet Hai Nguyen [1 ]
Hackl, Guenter [2 ]
Moreno, Jose G. [3 ]
Doucet, Antoine [1 ]
机构
[1] Univ La Rochelle, L3i, La Rochelle, France
[2] Innsbruck Univ Innovat GmbH, Innsbruck, Austria
[3] Univ Toulouse, IRIT, Toulouse, France
关键词
datasets; multilingual; diachronic historical newspapers; named entity recognition; entity linking; stance detection;
D O I
10.1145/3404835.3463255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Named entity processing over historical texts is more and more being used due to the massive documents and archives being stored in digital libraries. However, due to the poor annotated resources of historical nature, information extraction performances fall behind those on contemporary texts. In this paper, we introduce the development of the NewsEye resource, a multilingual dataset for named entity recognition and linking enriched with stances towards named entities. The dataset is comprised of diachronic historical newspaper material published between 1850 and 1950 in French, German, Finnish, and Swedish. Such historical resource is essential in the context of developing and evaluating named entity processing systems. It evenly allows enhancing the performances of existing approaches on historical documents which enables adequate and efficient semantic indexing of historical documents on digital cultural heritage collections.
引用
收藏
页码:2328 / 2334
页数:7
相关论文
共 50 条
  • [21] SciCN: A Scientific Dataset for Chinese Named Entity Recognition
    Yang, Jing
    Ji, Bin
    Li, Shasha
    Ma, Jun
    Yu, Jie
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (03): : 4303 - 4315
  • [22] HiNER: A Large Hindi Named Entity Recognition Dataset
    Murthy, Rudra
    Bhattacharjee, Pallab
    Sharnagat, Rahul
    Khatri, Jyotsana
    Kanojia, Diptesh
    Bhattacharyya, Pushpak
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4467 - 4476
  • [23] ViMedNER: A Medical Named Entity Recognition Dataset for Vietnamese
    Duong, Pham Van
    Trinh, Tien-Dat
    Nguyen, Minh-Tien
    Vu, Huy-The
    Pham, Minh-Chuan
    Tuan, Tran Manh
    Son, Le Hoang
    EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 2024, 11 (04)
  • [24] Eaglet - a Named Entity Recognition and Entity Linking Gold Standard Checking Tool
    Jha, Kunal
    Roeder, Michael
    Ngomo, Axel-Cyrille Ngonga
    SEMANTIC WEB: ESWC 2017 SATELLITE EVENTS, 2017, 10577 : 149 - 154
  • [25] SiNER: A Large Dataset for Sindhi Named Entity Recognition
    Ali, Wazir
    Lu, Junyu
    Xu, Zenglin
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2953 - 2961
  • [26] Towards a Standardized Dataset on Indonesian Named Entity Recognition
    Khairunnisa, Siti Oryza
    Imankulova, Aizhan
    Komachi, Mamoru
    AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 64 - 71
  • [27] A Dataset of German Legal Documents for Named Entity Recognition
    Leitner, Elena
    Rehm, Georg
    Moreno-Schneider, Julian
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4478 - 4485
  • [28] GERBIL - Benchmarking Named Entity Recognition and Linking consistently
    Roeder, Michael
    Usbeck, Ricardo
    Ngomo, Axel-Cyrille Ngonga
    SEMANTIC WEB, 2018, 9 (05) : 605 - 625
  • [29] Named Entity Recognition, Linking and Generation for Greek Legislation
    Angelidis, Iosif
    Chalkidis, Ilias
    Koubarakis, Manolis
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS (JURIX 2018), 2018, 313 : 1 - 10
  • [30] Tuning Multilingual Transformers for Named Entity Recognition on Slavic Languages
    Arkhipov, Mikhail
    Trofimova, Maria
    Kuratov, Yuri
    Sorokin, Alexey
    7TH WORKSHOP ON BALTO-SLAVIC NATURAL LANGUAGE PROCESSING (BSNLP'2019), 2019, : 89 - 93