A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers

被引:24
|
作者
Hamdi, Ahmed [1 ]
Pontes, Elvys Linhares [1 ]
Boros, Emanuela [1 ]
Thi Tuyet Hai Nguyen [1 ]
Hackl, Guenter [2 ]
Moreno, Jose G. [3 ]
Doucet, Antoine [1 ]
机构
[1] Univ La Rochelle, L3i, La Rochelle, France
[2] Innsbruck Univ Innovat GmbH, Innsbruck, Austria
[3] Univ Toulouse, IRIT, Toulouse, France
关键词
datasets; multilingual; diachronic historical newspapers; named entity recognition; entity linking; stance detection;
D O I
10.1145/3404835.3463255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Named entity processing over historical texts is more and more being used due to the massive documents and archives being stored in digital libraries. However, due to the poor annotated resources of historical nature, information extraction performances fall behind those on contemporary texts. In this paper, we introduce the development of the NewsEye resource, a multilingual dataset for named entity recognition and linking enriched with stances towards named entities. The dataset is comprised of diachronic historical newspaper material published between 1850 and 1950 in French, German, Finnish, and Swedish. Such historical resource is essential in the context of developing and evaluating named entity processing systems. It evenly allows enhancing the performances of existing approaches on historical documents which enables adequate and efficient semantic indexing of historical documents on digital cultural heritage collections.
引用
收藏
页码:2328 / 2334
页数:7
相关论文
共 50 条
  • [41] Personal Entity, Concept, and Named Entity Linking in Conversations
    Joko, Hideaki
    Hasibi, Faegheh
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4099 - 4103
  • [42] Exploiting anonymous entity mentions for named entity linking
    Feng Hou
    Ruili Wang
    See-Kiong Ng
    Michael Witbrock
    Fangyi Zhu
    Xiaoyun Jia
    Knowledge and Information Systems, 2023, 65 : 1221 - 1242
  • [43] NNE: A Dataset for Nested Named Entity Recognition in English Newswire
    Ringland, Nicky
    Dai, Xiang
    Hachey, Ben
    Karimi, Sarvnaz
    Paris, Cecile
    Curran, James R.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5176 - 5181
  • [44] Exploiting anonymous entity mentions for named entity linking
    Hou, Feng
    Wang, Ruili
    Ng, See-Kiong
    Witbrock, Michael
    Zhu, Fangyi
    Jia, Xiaoyun
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (03) : 1221 - 1242
  • [45] Interpretable Multi-dataset Evaluation for Named Entity Recognition
    Fu, Jinlan
    Liu, Pengfei
    Neubig, Graham
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6058 - 6069
  • [46] EDNER: Edge Detection for Named Entity Recognition
    Gao, Liangyu
    Yang, Zhihao
    Luo, Ling
    Liu, Wenfei
    Lin, Hongfei
    Wang, Jian
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 149 - 160
  • [47] EduNER: a Chinese named entity recognition dataset for education research
    Li, Xu
    Wei, Chengkun
    Jiang, Zhuoren
    Meng, Wenlong
    Ouyang, Fan
    Zhang, Zihui
    Chen, Wenzhi
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (24): : 17717 - 17731
  • [48] NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval
    Katz, Uri
    Vetzler, Matan
    Cohen, Amir D. N.
    Goldberg, Yoav
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3340 - 3354
  • [49] Statistical dataset evaluation: A case study on named entity recognition
    Wang, Chengwen
    Dong, Qingxiu
    Wang, Xiaochen
    Sui, Zhifang
    NATURAL LANGUAGE PROCESSING, 2025, 31 (01): : 90 - 110
  • [50] MELHISSA: a multilingual entity linking architecture for historical press articles
    Pontes, Elvys Linhares
    Cabrera-Diego, Luis Adrian
    Moreno, Jose G.
    Boros, Emanuela
    Hamdi, Ahmed
    Doucet, Antoine
    Sidere, Nicolas
    Coustaty, Mickael
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2022, 23 (02) : 133 - 160