Unsupervised Key Event Detection from Massive Text Corpora

被引:6
|
作者
Zhang, Yunyi [1 ]
Guo, Fang [2 ]
Shen, Jiaming [3 ]
Han, Jiawei [1 ]
机构
[1] UIUC, Champaign, IL 61820 USA
[2] Westlake Univ, Hangzhou, Peoples R China
[3] Google Res, New York, NY USA
来源
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年
基金
美国国家科学基金会;
关键词
Unsupervised Event Detection; Document Classification; Pretrained Language Models; Phrase Extraction;
D O I
10.1145/3534678.3539395
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automated event detection from news corpora is a crucial task towards mining fast-evolving structured knowledge. As real-world events have different granularities, from the top-level themes to key events and then to event mentions corresponding to concrete actions, there are generally two lines of research: (1) theme detection tries to identify from a news corpus major themes (e.g., "2019 Hong Kong Protests" versus "2020 U.S. Presidential Election") which have very distinct semantics; and (2) action extraction aims to extract from a single document mention-level actions (e.g., "the police hit the left arm of the protester") that are often too fine-grained for comprehending the real-world event. In this paper, we propose a new task, key event detection at the intermediate level, which aims to detect from a news corpus key events (e.g., HK Airport Protest on Aug. 12-14), each happening at a particular time/location and focusing on the same topic. This task can bridge event understanding and structuring and is inherently challenging because of (1) the thematic and temporal closeness of different key events and (2) the scarcity of labeled data due to the fast-evolving nature of news articles. To address these challenges, we develop an unsupervised key event detection framework, EvMine, that (1) extracts temporally frequent peak phrases using a novel ttf-itf score, (2) merges peak phrases into event-indicative feature sets by detecting communities from our designed peak phrase graph that captures document co-occurrences, semantic similarities, and temporal closeness signals, and (3) iteratively retrieves documents related to each key event by training a classifier with automatically generated pseudo labels from the event-indicative feature sets and refining the detected key events using the retrieved documents in each iteration. Extensive experiments and case studies show EvMine outperforms all the baseline methods and its ablations on two real-world news corpora.
引用
收藏
页码:2535 / 2544
页数:10
相关论文
共 50 条
  • [31] An unsupervised learning approach to musical event detection
    Gao, S
    Lee, CH
    Zhu, YW
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1307 - 1310
  • [32] An Annotation Schema for the Detection of Social Bias in Legal Text Corpora
    Gumusel, Ece
    Malic, Vincent Quirante
    Donaldson, Devan Ray
    Ashley, Kevin
    Liu, Xiaozhong
    INFORMATION FOR A BETTER WORLD: SHAPING THE GLOBAL FUTURE, PT I, 2022, 13192 : 185 - 194
  • [33] Navigating Massive Text Reports: An Automated Approach to Aviation Safety Reporting System Safety Event Detection
    Dou, Zhi
    Keller, Julius
    Gao, Yi
    TRANSPORTATION RESEARCH RECORD, 2024, : 1706 - 1719
  • [34] Unsupervised Ontology Induction from Text
    Poon, Hoifung
    Domingos, Pedro
    ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 296 - 305
  • [35] Scalable Topical Phrase Mining from Text Corpora
    El-Kishky, Ahmed
    Song, Yanglei
    Wang, Chi
    Voss, Clare R.
    Han, Jiawei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (03): : 305 - 316
  • [36] Automatic Maneuver Boundary Detection System for Naturalistic Driving Massive Corpora
    Sathyanarayana, Amardeep
    Sadjadi, Seyed Omid
    Hansen, John H. L.
    SAE INTERNATIONAL JOURNAL OF PASSENGER CARS-ELECTRONIC AND ELECTRICAL SYSTEMS, 2014, 7 (01): : 149 - 156
  • [37] Discovering the Ebb and Flow of Ideas from Text Corpora
    Jee, Justin
    Klippel, Lee Case
    Hossain, M. Shahriar
    Ramakrishnan, Naren
    Mishra, Bud
    COMPUTER, 2012, 45 (02) : 73 - 77
  • [38] Extracting semantic representations from large text corpora
    Patel, M
    Bullinaria, JA
    Levy, JP
    4TH NEURAL COMPUTATION AND PSYCHOLOGY WORKSHOP, LONDON, 9-11 APRIL 1997: CONNECTIONIST REPRESENTATIONS, 1997, : 199 - 212
  • [39] Unsupervised Event Detection with Infinite Poisson Mixture Model
    Hegde, Vinod
    Krnjajic, Milovan
    Pozdnoukhov, Alexei
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 567 - 575
  • [40] Experimenting with Unsupervised Multilingual Event Detection in Historical Newspapers
    Boros, Emanuela
    Cabrera-Diego, Luis Adrian
    Doucet, Antoine
    FROM BORN-PHYSICAL TO BORN-VIRTUAL: AUGMENTING INTELLIGENCE IN DIGITAL LIBRARIES, ICADL 2022, 2022, 13636 : 182 - 193