Unsupervised Key Event Detection from Massive Text Corpora

被引:6
|
作者
Zhang, Yunyi [1 ]
Guo, Fang [2 ]
Shen, Jiaming [3 ]
Han, Jiawei [1 ]
机构
[1] UIUC, Champaign, IL 61820 USA
[2] Westlake Univ, Hangzhou, Peoples R China
[3] Google Res, New York, NY USA
基金
美国国家科学基金会;
关键词
Unsupervised Event Detection; Document Classification; Pretrained Language Models; Phrase Extraction;
D O I
10.1145/3534678.3539395
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automated event detection from news corpora is a crucial task towards mining fast-evolving structured knowledge. As real-world events have different granularities, from the top-level themes to key events and then to event mentions corresponding to concrete actions, there are generally two lines of research: (1) theme detection tries to identify from a news corpus major themes (e.g., "2019 Hong Kong Protests" versus "2020 U.S. Presidential Election") which have very distinct semantics; and (2) action extraction aims to extract from a single document mention-level actions (e.g., "the police hit the left arm of the protester") that are often too fine-grained for comprehending the real-world event. In this paper, we propose a new task, key event detection at the intermediate level, which aims to detect from a news corpus key events (e.g., HK Airport Protest on Aug. 12-14), each happening at a particular time/location and focusing on the same topic. This task can bridge event understanding and structuring and is inherently challenging because of (1) the thematic and temporal closeness of different key events and (2) the scarcity of labeled data due to the fast-evolving nature of news articles. To address these challenges, we develop an unsupervised key event detection framework, EvMine, that (1) extracts temporally frequent peak phrases using a novel ttf-itf score, (2) merges peak phrases into event-indicative feature sets by detecting communities from our designed peak phrase graph that captures document co-occurrences, semantic similarities, and temporal closeness signals, and (3) iteratively retrieves documents related to each key event by training a classifier with automatically generated pseudo labels from the event-indicative feature sets and refining the detected key events using the retrieved documents in each iteration. Extensive experiments and case studies show EvMine outperforms all the baseline methods and its ablations on two real-world news corpora.
引用
收藏
页码:2535 / 2544
页数:10
相关论文
共 50 条
  • [1] Automated Phrase Mining from Massive Text Corpora
    Shang, Jingbo
    Liu, Jialu
    Jiang, Meng
    Ren, Xiang
    Voss, Clare R.
    Han, Jiawei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (10) : 1825 - 1837
  • [2] Mining Quality Phrases from Massive Text Corpora
    Liu, Jialu
    Shang, Jingbo
    Wang, Chi
    Ren, Xiang
    Han, Jiawei
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 1729 - 1744
  • [3] Discovery of event entailment knowledge from text corpora
    Pekar, Viktor
    COMPUTER SPEECH AND LANGUAGE, 2008, 22 (01): : 1 - 16
  • [4] Unsupervised Anomaly Detection in Multi-Topic Short-Text Corpora
    Ait-Saada, Mira
    Nadif, Mohamed
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1392 - 1403
  • [5] Constructing Structured Information Networks from Massive Text Corpora
    Ren, Xiang
    Jiang, Meng
    Shang, Jingbo
    Han, Jiawei
    WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 951 - 954
  • [6] MetaPAD: Meta Pattern Discovery from Massive Text Corpora
    Jiang, Meng
    Shang, Jingbo
    Cassidy, Taylor
    Ren, Xiang
    Kaplan, Lance M.
    Hanratty, Timothy P.
    Han, Jiawei
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 877 - 886
  • [7] Unsupervised Phrasal Near-Synonym Generation from Text Corpora
    Gupta, Dishan
    Carbonell, Jaime
    Gershman, Anatole
    Klein, Steve
    Miller, David
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2253 - 2259
  • [8] Unsupervised event exploration from social text streams
    Zhou, Deyu
    Chen, Liangyu
    Zhang, Xuan
    He, Yulan
    INTELLIGENT DATA ANALYSIS, 2017, 21 (04) : 849 - 866
  • [9] Building Structured Databases of Factual Knowledge from Massive Text Corpora
    Ren, Xiang
    Jiang, Meng
    Shang, Jingbo
    Han, Jiawei
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1741 - 1745
  • [10] An Unsupervised Approach for the Detection of Outliers in Corpora
    Guthrie, David
    Guthrie, Louise
    Wilks, Yorick
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3409 - 3413