OCCAMS - An Optimal Combinatorial Covering Algorithm for Multi-document Summarization

被引:20
|
作者
Davis, Sashka T. [1 ]
Conroy, John M. [1 ]
Schlesinger, Judith D. [1 ]
机构
[1] IDA Ctr Comp Sci, Bowie, MD USA
关键词
D O I
10.1109/ICDMW.2012.50
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
OCCAMS is a new algorithm for the Multi-Document Summarization (MDS) problem. We use Latent Semantic Analysis (LSA) to produce term weights which identify the main theme(s) of a set of documents. These are used by our heuristic for extractive sentence selection which borrows techniques from combinatorial optimization to select a set of sentences such that the combined weight of the terms covered is maximized while redundancy is minimized. OCCAMS outperforms CLASSY11 on DUC/TAC data for nearly all years since 2005, where CLASSY11 is the best human-rated system of TAC 2011. OCCAMS also delivers higher ROUGE scores than all human-generated summaries for TAC 2011. We show that if the combinatorial component of OCCAMS, which computes the extractive summary, is given true weights of terms, then the quality of the summaries generated outperforms all human generated summaries for all years using ROUGE-2, ROUGE-SU4, and a coverage metric. We introduce this new metric based on term coverage and demonstrate that a simple bi-gram instantiation achieves a statistically significant higher Pearson correlation with overall responsiveness than ROUGE on the TAC data.
引用
收藏
页码:454 / 463
页数:10
相关论文
共 50 条
  • [1] Genetic algorithm based multi-document summarization
    Liu, Dexi
    He, Yanxiang
    Ji, Donghong
    Yang, Hua
    PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 1140 - 1144
  • [2] A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization
    Parnell, Jacob
    Unanue, Inigo Jauregi
    Piccardi, Massimo
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5112 - 5128
  • [3] Multi-document Summarization Algorithm based on Significance Sentences
    Liu Na
    Lu Ying
    Tang Xiao-Jun
    Wang Hai-Wen
    Xiao Peng
    Li Ming-Xia
    PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 3847 - 3852
  • [4] Topic-Sensitive Multi-document Summarization Algorithm
    Liu Na
    Di Tang
    Lu Ying
    Tang Xiao-jun
    Wang Hai-wen
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 12 (04) : 1375 - 1389
  • [5] Topic-Sensitive Multi-document Summarization Algorithm
    Liu Na
    Tang Xiao-jun
    Lu Ying
    Li Ming-xia
    Wang Hai-wen
    Xiao Peng
    2014 SIXTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2014, : 69 - 74
  • [6] MULTI-DOCUMENT VIDEO SUMMARIZATION
    Wang, Feng
    Merialdo, Bernard
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1326 - 1329
  • [7] On redundancy in multi-document summarization
    Calvo, Hiram
    Carrillo-Mendoza, Pabel
    Gelbukh, Alexander
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (05) : 3245 - 3255
  • [8] Abstractive Multi-Document Summarization
    Ranjitha, N. S.
    Kallimani, Jagadish S.
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 1690 - 1693
  • [9] Research on sentence optimum selection algorithm for multi-document summarization
    Zhang, Shu
    Zhao, Tie-Jun
    Yao, Chao
    Zheng, De-Quan
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2008, 30 (12): : 2921 - 2925
  • [10] Multi-document extractive text summarization based on firefly algorithm
    Tomer, Minakshi
    Kumar, Manoj
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) : 6057 - 6065