OCCAMS - An Optimal Combinatorial Covering Algorithm for Multi-document Summarization

被引:20
|
作者
Davis, Sashka T. [1 ]
Conroy, John M. [1 ]
Schlesinger, Judith D. [1 ]
机构
[1] IDA Ctr Comp Sci, Bowie, MD USA
关键词
D O I
10.1109/ICDMW.2012.50
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
OCCAMS is a new algorithm for the Multi-Document Summarization (MDS) problem. We use Latent Semantic Analysis (LSA) to produce term weights which identify the main theme(s) of a set of documents. These are used by our heuristic for extractive sentence selection which borrows techniques from combinatorial optimization to select a set of sentences such that the combined weight of the terms covered is maximized while redundancy is minimized. OCCAMS outperforms CLASSY11 on DUC/TAC data for nearly all years since 2005, where CLASSY11 is the best human-rated system of TAC 2011. OCCAMS also delivers higher ROUGE scores than all human-generated summaries for TAC 2011. We show that if the combinatorial component of OCCAMS, which computes the extractive summary, is given true weights of terms, then the quality of the summaries generated outperforms all human generated summaries for all years using ROUGE-2, ROUGE-SU4, and a coverage metric. We introduce this new metric based on term coverage and demonstrate that a simple bi-gram instantiation achieves a statistically significant higher Pearson correlation with overall responsiveness than ROUGE on the TAC data.
引用
收藏
页码:454 / 463
页数:10
相关论文
共 50 条
  • [11] Abstractive Multi-Document Text Summarization Using a Genetic Algorithm
    Neri Mendoza, Veronica
    Ledeneva, Yulia
    Arnulfo Garcia-Hernandez, Rene
    PATTERN RECOGNITION, MCPR 2019, 2019, 11524 : 422 - 432
  • [12] MSBGA: A multi-document summarization system based on genetic algorithm
    He, Yan-Xiang
    Liu, De-Xi
    Ji, Dong-Hong
    Yang, Hua
    Teng, Chong
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2659 - +
  • [13] Weighted consensus multi-document summarization
    Wang, Dingding
    Li, Tao
    INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (03) : 513 - 523
  • [14] MULTI-DOCUMENT SUMMARIZATION SYSTEMS COMPARISON
    Li, Lei
    Heng, Wei
    Liu, Ping'an
    2012 IEEE 2nd International Conference on Cloud Computing and Intelligent Systems (CCIS) Vols 1-3, 2012, : 1409 - 1413
  • [15] Multi-Document Summarization for Turkish News
    Demirci, Ferhat
    Karabudak, Engin
    Ilgen, Bahar
    2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
  • [16] Multi-document summarization via submodularity
    Li, Jingxuan
    Li, Lei
    Li, Tao
    APPLIED INTELLIGENCE, 2012, 37 (03) : 420 - 430
  • [17] Multi-document text summarization - A survey
    Tandel, Amol
    Modi, Brijesh
    Gupta, Priyasha
    Wagle, Shreya
    Khedkar, Sujata
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA MINING AND ADVANCED COMPUTING (SAPIENCE), 2016, : 336 - 339
  • [18] An Overview of Research on Multi-Document Summarization
    Bao R.
    Sun H.
    Data Analysis and Knowledge Discovery, 2024, 8 (02) : 17 - 32
  • [19] Multi-document summarization via submodularity
    Jingxuan Li
    Lei Li
    Tao Li
    Applied Intelligence, 2012, 37 : 420 - 430
  • [20] Multi-Document Summarization by Information Distance
    Long, Chong
    Huang, Minlie
    Zhu, Xiaoyan
    Li, Ming
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 866 - +