Unsupervised Key Event Detection from Massive Text Corpora

被引:6
|
作者
Zhang, Yunyi [1 ]
Guo, Fang [2 ]
Shen, Jiaming [3 ]
Han, Jiawei [1 ]
机构
[1] UIUC, Champaign, IL 61820 USA
[2] Westlake Univ, Hangzhou, Peoples R China
[3] Google Res, New York, NY USA
来源
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年
基金
美国国家科学基金会;
关键词
Unsupervised Event Detection; Document Classification; Pretrained Language Models; Phrase Extraction;
D O I
10.1145/3534678.3539395
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automated event detection from news corpora is a crucial task towards mining fast-evolving structured knowledge. As real-world events have different granularities, from the top-level themes to key events and then to event mentions corresponding to concrete actions, there are generally two lines of research: (1) theme detection tries to identify from a news corpus major themes (e.g., "2019 Hong Kong Protests" versus "2020 U.S. Presidential Election") which have very distinct semantics; and (2) action extraction aims to extract from a single document mention-level actions (e.g., "the police hit the left arm of the protester") that are often too fine-grained for comprehending the real-world event. In this paper, we propose a new task, key event detection at the intermediate level, which aims to detect from a news corpus key events (e.g., HK Airport Protest on Aug. 12-14), each happening at a particular time/location and focusing on the same topic. This task can bridge event understanding and structuring and is inherently challenging because of (1) the thematic and temporal closeness of different key events and (2) the scarcity of labeled data due to the fast-evolving nature of news articles. To address these challenges, we develop an unsupervised key event detection framework, EvMine, that (1) extracts temporally frequent peak phrases using a novel ttf-itf score, (2) merges peak phrases into event-indicative feature sets by detecting communities from our designed peak phrase graph that captures document co-occurrences, semantic similarities, and temporal closeness signals, and (3) iteratively retrieves documents related to each key event by training a classifier with automatically generated pseudo labels from the event-indicative feature sets and refining the detected key events using the retrieved documents in each iteration. Extensive experiments and case studies show EvMine outperforms all the baseline methods and its ablations on two real-world news corpora.
引用
收藏
页码:2535 / 2544
页数:10
相关论文
共 50 条
  • [41] AN UNSUPERVISED LEARNING BASED APPROACH FOR UNEXPECTED EVENT DETECTION
    Luvison, Bertrand
    Chateau, Thierry
    Sayd, Patrick
    Pham, Quoc-Cuong
    Lapreste, Jean-Thierry
    VISAPP 2009: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 1, 2009, : 506 - +
  • [42] A Video Text Detection Method Based on Key Text Points
    Li, Zhi
    Liu, Guizhong
    Qian, Xueming
    Wang, Chen
    Ma, Yana
    Yang, Yang
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING-PCM 2010, PT I, 2010, 6297 : 284 - 295
  • [43] A corpora-based detection of stylistic inconsistencies of text in the targeted subgenre
    Hashimoto, Kiyota
    Takeuchi, Kazuhiro
    Ando, Hideaki
    ARTIFICIAL LIFE AND ROBOTICS, 2010, 15 (04) : 486 - 490
  • [44] Discovering Event Evolution Graphs From News Corpora
    Yang, Christopher C.
    Shi, Xiaodong
    Wei, Chih-Ping
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2009, 39 (04): : 850 - 863
  • [45] Unsupervised Domain Ontology Learning from Text
    Venu, Sree Harissh
    Mohan, Vignesh
    Urkalan, Kodaikkaavirinaadan
    Geetha, T., V
    MINING INTELLIGENCE AND KNOWLEDGE EXPLORATION (MIKE 2016), 2017, 10089 : 132 - 143
  • [46] High-performance unsupervised relation extraction from large corpora
    Rozenfeld, Binjamin
    Feldman, Ronen
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 1032 - +
  • [47] Many-to-Many Unsupervised Speech Conversion From Nonparallel Corpora
    Lee, Yun Kyung
    Kim, Hyun Woo
    Park, Jeon Gue
    IEEE ACCESS, 2021, 9 : 27278 - 27286
  • [48] Unsupervised lexicon acquisition from speech and text
    Kurata, Gakuto
    Mori, Shinsuke
    Itoh, Nobuyasu
    Nishimura, Masafumi
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 421 - +
  • [49] Unsupervised Singleton Expansion from Free Text
    Atzori, Maurizio
    Balloccu, Simone
    Bellanti, Andrea
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 180 - 185
  • [50] Unsupervised Text Generation by Learning from Search
    Li, Jingjing
    Li, Zichao
    Mou, Lili
    Jiang, Xin
    Lyu, Michael R.
    King, Irwin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33