Identification of Event and Topic for Multi-document Summarization

被引:1
|
作者
Fukumoto, Fumiyo [1 ]
Suzuki, Yoshimi [1 ]
Takasu, Atsuhiro [2 ]
Matsuyoshi, Suguru [3 ]
机构
[1] Univ Yamanashi, Grad Fac Interdisciplinary Res, Kofu, Yamanashi 4008510, Japan
[2] Natl Inst Informat, Tokyo, Japan
[3] Univ Yamanashi, Interdisciplinary Grad Sch Med & Engn, Kofu, Yamanashi, Japan
关键词
Latent Dirichlet Allocation; Moving Average Convergence/Divergence; Multi-document summarization;
D O I
10.1007/978-3-319-43808-5_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on continuous news documents and presents a method for extractive multi-document summarization. Our hypothesis about salient, key sentences in news documents is that they include words related to the target event and topic of a document. Here, an event and a topic are the same as Topic Detection and Tracking (TDT) project: an event is something that occurs at a specific place and time along with all necessary preconditions and unavoidable consequences, and a topic is defined to be "a seminal event or activity along with all directly related events and activities." The difficulty for finding topics is that they have various word distributions. In addition to the TF-IDF term weighting method to extract event words, we identified topics by using two models, i. e., Moving Average Convergence Divergence (MACD) for words with high frequencies, and Latent Dirichlet Allocation (LDA) for low frequency words. The method was tested on two datasets, NTCIR-3 Japanese news documents and DUC data, and the results showed the effectiveness of the method.
引用
收藏
页码:304 / 316
页数:13
相关论文
共 50 条
  • [31] Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles
    Alambo, Amanuel
    Lohstroh, Cori
    Madaus, Erik
    Padhee, Swati
    Foster, Brandy
    Banerjee, Tanvi
    Thirunarayan, Krishnaprasad
    Raymer, Michael
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 591 - 596
  • [32] Manifold-Ranking Based Topic-Focused Multi-Document Summarization
    Wan, Xiaojun
    Yang, Jianwu
    Xiao, Jianguo
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2903 - 2908
  • [33] Weighted consensus multi-document summarization
    Wang, Dingding
    Li, Tao
    INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (03) : 513 - 523
  • [34] MULTI-DOCUMENT SUMMARIZATION SYSTEMS COMPARISON
    Li, Lei
    Heng, Wei
    Liu, Ping'an
    2012 IEEE 2nd International Conference on Cloud Computing and Intelligent Systems (CCIS) Vols 1-3, 2012, : 1409 - 1413
  • [35] Multi-Document Summarization for Turkish News
    Demirci, Ferhat
    Karabudak, Engin
    Ilgen, Bahar
    2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
  • [36] Multi-document summarization via submodularity
    Li, Jingxuan
    Li, Lei
    Li, Tao
    APPLIED INTELLIGENCE, 2012, 37 (03) : 420 - 430
  • [37] Multi-document text summarization - A survey
    Tandel, Amol
    Modi, Brijesh
    Gupta, Priyasha
    Wagle, Shreya
    Khedkar, Sujata
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA MINING AND ADVANCED COMPUTING (SAPIENCE), 2016, : 336 - 339
  • [38] An Overview of Research on Multi-Document Summarization
    Bao R.
    Sun H.
    Data Analysis and Knowledge Discovery, 2024, 8 (02) : 17 - 32
  • [39] Multi-document summarization via submodularity
    Jingxuan Li
    Lei Li
    Tao Li
    Applied Intelligence, 2012, 37 : 420 - 430
  • [40] Multi-Document Summarization by Information Distance
    Long, Chong
    Huang, Minlie
    Zhu, Xiaoyan
    Li, Ming
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 866 - +