The Causal News Corpus: Annotating Causal Relations in Event Sentences from News

被引:0
|
作者
Tan, Fiona Anting [1 ]
Hurriyetoglu, Ali [2 ]
Caselli, Tommaso [3 ]
Oostdijk, Nelleke [4 ]
Nomoto, Tadashi [5 ]
Hettiarachchi, Hansi [6 ]
Ameer, Iqra [7 ]
Uca, Onur [8 ]
Liza, Farhana Ferdousi [9 ]
Hu, Tiancheng [10 ]
机构
[1] Natl Univ Singapore, Inst Data Sci, Singapore, Singapore
[2] Koc Univ, Istanbul, Turkey
[3] Univ Groningen, Groningen, Netherlands
[4] Radboud Univ Nijmegen, Nijmegen, Netherlands
[5] Natl Inst Japanese Literature, Tokyo, Japan
[6] Birmingham City Univ, Birmingham, W Midlands, England
[7] Inst Politecn Nacl, Ctr Invest Comp, Mexico City, DF, Mexico
[8] Mersin Univ, Dept Sociol, Mersin, Turkey
[9] Univ East Anglia, Norwich, Norfolk, England
[10] Swiss Fed Inst Technol, Zurich, Switzerland
基金
新加坡国家研究基金会;
关键词
causality; event causality; text mining; natural language understanding;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Therefore, we propose an annotation schema for event causality that addresses these concerns. We annotated 3,559 event sentences from protest event news with labels on whether it contains causal relations or not. Our corpus is known as the Causal News Corpus (CNC). A neural network built upon a state-of-the-art pre-trained language model performed well with 81.20% F1 score on test set, and 83.46% in 5-folds cross-validation. CNC is transferable across two external corpora: CausalTimeBank (CTB) and Penn Discourse Treebank (PDTB). Leveraging each of these external datasets for training, we achieved up to approximately 64% F1 on the CNC test set without additional fine-tuning. CNC also served as an effective training and pre-training dataset for the two external corpora. Lastly, we demonstrate the difficulty of our task to the layman in a crowd-sourced annotation exercise. Our annotated corpus is publicly available, providing a valuable resource for causal text mining researchers.
引用
收藏
页码:2298 / 2310
页数:13
相关论文
共 50 条
  • [41] Indistinguishability of causal relations from limited marginals
    Budroni, Costantino
    Miklin, Nikolai
    Chaves, Rafael
    PHYSICAL REVIEW A, 2016, 94 (04)
  • [42] Discovering causal relations and equations from data
    Camps-Valls, Gustau
    Gerhardus, Andreas
    Ninad, Urmi
    Varando, Gherardo
    Martius, Georg
    Balaguer-Ballester, Emili
    Vinuesa, Ricardo
    Diaz, Emiliano
    Zanna, Laure
    Runge, Jakob
    PHYSICS REPORTS-REVIEW SECTION OF PHYSICS LETTERS, 2023, 1044 : 1 - 68
  • [43] A storytree-based model for inter-document causal relation extraction from news articles
    Zhang, Chong
    Lyu, Jiagao
    Xu, Ke
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (02) : 827 - 853
  • [44] Causal graph extraction from news: a comparative study of time-series causality learning techniques
    Maisonnave, Mariano
    Delbianco, Fernando
    Tohme, Fernando
    Milios, Evangelos
    Maguitman, Ana G.
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [45] A storytree-based model for inter-document causal relation extraction from news articles
    Chong Zhang
    Jiagao Lyu
    Ke Xu
    Knowledge and Information Systems, 2023, 65 : 827 - 853
  • [46] Causal graph extraction from news: a comparative study of time-series causality learning techniques
    Maisonnave M.
    Delbianco F.
    Tohme F.
    Milios E.
    Maguitman A.G.
    PeerJ Computer Science, 2022, 8
  • [47] Media malaise or a virtuous circle? Exploring the causal relationships between news media exposure, political news attention and political interest
    Stromback, Jesper
    Shehata, Adam
    EUROPEAN JOURNAL OF POLITICAL RESEARCH, 2010, 49 (05) : 575 - 597
  • [48] Extraction of Conditional and Causal Sentences from Queries to Provide a Flexible Answer
    Puente, Cristina
    Sobrino, Alejandro
    Angel Olivas, Jose
    FLEXIBLE QUERY ANSWERING SYSTEMS: 8TH INTERNATIONAL CONFERENCE, FQAS 2009, 2009, 5822 : 477 - 487
  • [49] Process Mining Meets Causal Machine Learning: Discovering Causal Rules from Event Logs
    Bozorgi, Zahra Dasht
    Teinemaa, Irene
    Dumas, Marlon
    La Rosa, Marcello
    Polyvyanyy, Artem
    2020 2ND INTERNATIONAL CONFERENCE ON PROCESS MINING (ICPM 2020), 2020, : 129 - 136
  • [50] Constructing Event Templates from Written News
    Trampus, Mitja
    Mladenic, Dunja
    2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 507 - 510