The Causal News Corpus: Annotating Causal Relations in Event Sentences from News

被引:0
|
作者
Tan, Fiona Anting [1 ]
Hurriyetoglu, Ali [2 ]
Caselli, Tommaso [3 ]
Oostdijk, Nelleke [4 ]
Nomoto, Tadashi [5 ]
Hettiarachchi, Hansi [6 ]
Ameer, Iqra [7 ]
Uca, Onur [8 ]
Liza, Farhana Ferdousi [9 ]
Hu, Tiancheng [10 ]
机构
[1] Natl Univ Singapore, Inst Data Sci, Singapore, Singapore
[2] Koc Univ, Istanbul, Turkey
[3] Univ Groningen, Groningen, Netherlands
[4] Radboud Univ Nijmegen, Nijmegen, Netherlands
[5] Natl Inst Japanese Literature, Tokyo, Japan
[6] Birmingham City Univ, Birmingham, W Midlands, England
[7] Inst Politecn Nacl, Ctr Invest Comp, Mexico City, DF, Mexico
[8] Mersin Univ, Dept Sociol, Mersin, Turkey
[9] Univ East Anglia, Norwich, Norfolk, England
[10] Swiss Fed Inst Technol, Zurich, Switzerland
基金
新加坡国家研究基金会;
关键词
causality; event causality; text mining; natural language understanding;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Therefore, we propose an annotation schema for event causality that addresses these concerns. We annotated 3,559 event sentences from protest event news with labels on whether it contains causal relations or not. Our corpus is known as the Causal News Corpus (CNC). A neural network built upon a state-of-the-art pre-trained language model performed well with 81.20% F1 score on test set, and 83.46% in 5-folds cross-validation. CNC is transferable across two external corpora: CausalTimeBank (CTB) and Penn Discourse Treebank (PDTB). Leveraging each of these external datasets for training, we achieved up to approximately 64% F1 on the CNC test set without additional fine-tuning. CNC also served as an effective training and pre-training dataset for the two external corpora. Lastly, we demonstrate the difficulty of our task to the layman in a crowd-sourced annotation exercise. Our annotated corpus is publicly available, providing a valuable resource for causal text mining researchers.
引用
收藏
页码:2298 / 2310
页数:13
相关论文
共 50 条
  • [1] A Corpus of General and Specific Sentences from News
    Louis, Annie
    Nenkova, Ani
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1818 - 1821
  • [2] Annotating and Analyzing Biased Sentences in News Articles using Crowdsourcing
    Lim, Sora
    Jatowt, Adam
    Farber, Michael
    Yoshikawa, Masatoshi
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1478 - 1484
  • [3] Identifying Predictive Causal Factors from News Streams
    Balashankar, Ananth
    Chakraborty, Sunandan
    Fraiberger, Samuel
    Subramanian, Lakshminarayanan
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2338 - 2348
  • [4] Toward a discourse theory for annotating causal relations in Japanese
    Kaneko, Kimi
    Bekki, Daisuke
    Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014, 2014, : 460 - 469
  • [5] An Attribution Relations Corpus for Political News
    Newell, Edward
    Margolin, Drew
    Ruths, Derek
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3315 - 3322
  • [6] Causal Network Construction to Support Understanding of News
    Ishii, Hiroshi
    Ma, Qiang
    Yoshikawa, Masatoshi
    43RD HAWAII INTERNATIONAL CONFERENCE ON SYSTEMS SCIENCES VOLS 1-5 (HICSS 2010), 2010, : 1536 - 1545
  • [7] Research on the construction of event corpus with document-level causal relations for social security
    Xiang, Ga
    Zhang, Yangsen
    Tan, Jianlong
    Ran, Zihan
    Shi, En
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
  • [8] HiEve: A Corpus for Extracting Event Hierarchies from News Stories
    Glavas, Goran
    Snajder, Jan
    Kordjamshidi, Parisa
    Moens, Marie-Francine
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3678 - 3683
  • [9] News of disinflation and firms' expectations: New causal evidence
    Caruso-Bloeck, Martin
    Mello, Miguel
    Ponce, Jorge
    JOURNAL OF INTERNATIONAL MONEY AND FINANCE, 2023, 137
  • [10] Causal Understanding of Fake News Dissemination on Social Media
    Cheng, Lu
    Guo, Ruocheng
    Shu, Kai
    Liu, Huan
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 148 - 157