OCCAMS - An Optimal Combinatorial Covering Algorithm for Multi-document Summarization

被引:20
|
作者
Davis, Sashka T. [1 ]
Conroy, John M. [1 ]
Schlesinger, Judith D. [1 ]
机构
[1] IDA Ctr Comp Sci, Bowie, MD USA
关键词
D O I
10.1109/ICDMW.2012.50
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
OCCAMS is a new algorithm for the Multi-Document Summarization (MDS) problem. We use Latent Semantic Analysis (LSA) to produce term weights which identify the main theme(s) of a set of documents. These are used by our heuristic for extractive sentence selection which borrows techniques from combinatorial optimization to select a set of sentences such that the combined weight of the terms covered is maximized while redundancy is minimized. OCCAMS outperforms CLASSY11 on DUC/TAC data for nearly all years since 2005, where CLASSY11 is the best human-rated system of TAC 2011. OCCAMS also delivers higher ROUGE scores than all human-generated summaries for TAC 2011. We show that if the combinatorial component of OCCAMS, which computes the extractive summary, is given true weights of terms, then the quality of the summaries generated outperforms all human generated summaries for all years using ROUGE-2, ROUGE-SU4, and a coverage metric. We introduce this new metric based on term coverage and demonstrate that a simple bi-gram instantiation achieves a statistically significant higher Pearson correlation with overall responsiveness than ROUGE on the TAC data.
引用
收藏
页码:454 / 463
页数:10
相关论文
共 50 条
  • [41] Mixture of Topic Model for Multi-document Summarization
    Liu Na
    Li Ming-xia
    Lu Ying
    Tang Xiao-jun
    Wang Hai-wen
    Xiao Peng
    26TH CHINESE CONTROL AND DECISION CONFERENCE (2014 CCDC), 2014, : 5168 - 5172
  • [42] Disentangling Specificity for Abstractive Multi-document Summarization
    Ma, Congbo (congbo.ma@mq.edu.au), 1600, Institute of Electrical and Electronics Engineers Inc.
  • [43] A Game Theory Approach for Multi-document Summarization
    Ahmad, Amreen
    Ahmad, Tanvir
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (04) : 3655 - 3667
  • [44] Multi-document summarization based on unsupervised clustering
    Ji, Paul
    INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS, 2006, 4182 : 560 - 566
  • [45] Geodesic Distance based Multi-document Summarization
    Ma, Huifang
    He, Qing
    Shi, Zhongzhi
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 54 - 59
  • [46] A Hybrid Topic Model for Multi-Document Summarization
    Xu, JinAn
    Liu, JiangMing
    Araki, Kenji
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (05): : 1089 - 1094
  • [47] MRS for multi-document summarization by sentence extraction
    Yong-Dong Xu
    Xiao-Dong Zhang
    Guang-Ri Quan
    Ya-Dong Wang
    Telecommunication Systems, 2013, 53 : 91 - 98
  • [48] Personalized Multi-Document Summarization in information retrieval
    Yang, Xiao-Peng
    Liu, Xiao-Rong
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 4108 - +
  • [49] Multi-document Summarization using Tensor Decomposition
    Litvak, Marina
    Vanetik, Natalia
    COMPUTACION Y SISTEMAS, 2014, 18 (03): : 581 - 589
  • [50] Multi-document Summarization for E-Learning
    Wang, Fu Lee
    Kwan, Reggie
    Hung, Sheung Lun
    HYBRID LEARNING AND EDUCATION, PROCEEDINGS, 2009, 5685 : 353 - +