OCCAMS - An Optimal Combinatorial Covering Algorithm for Multi-document Summarization

被引:20
|
作者
Davis, Sashka T. [1 ]
Conroy, John M. [1 ]
Schlesinger, Judith D. [1 ]
机构
[1] IDA Ctr Comp Sci, Bowie, MD USA
关键词
D O I
10.1109/ICDMW.2012.50
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
OCCAMS is a new algorithm for the Multi-Document Summarization (MDS) problem. We use Latent Semantic Analysis (LSA) to produce term weights which identify the main theme(s) of a set of documents. These are used by our heuristic for extractive sentence selection which borrows techniques from combinatorial optimization to select a set of sentences such that the combined weight of the terms covered is maximized while redundancy is minimized. OCCAMS outperforms CLASSY11 on DUC/TAC data for nearly all years since 2005, where CLASSY11 is the best human-rated system of TAC 2011. OCCAMS also delivers higher ROUGE scores than all human-generated summaries for TAC 2011. We show that if the combinatorial component of OCCAMS, which computes the extractive summary, is given true weights of terms, then the quality of the summaries generated outperforms all human generated summaries for all years using ROUGE-2, ROUGE-SU4, and a coverage metric. We introduce this new metric based on term coverage and demonstrate that a simple bi-gram instantiation achieves a statistically significant higher Pearson correlation with overall responsiveness than ROUGE on the TAC data.
引用
收藏
页码:454 / 463
页数:10
相关论文
共 50 条
  • [31] An Optimization Algorithm for Extractive Multi-document Summarization Based on Association of Sentences
    Chen, Chun-Hao
    Yang, Yi-Chen
    Lin, Jerry Chun-Wei
    ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND PRACTICES IN ARTIFICIAL INTELLIGENCE, 2022, 13343 : 460 - 469
  • [32] Multi-document summarization sentence ordering algorithm using semantic analysis
    Ji, Min
    Liao, Junbi
    Lei, Jingfa
    Yuan, Zhongfan
    Advances in Information Sciences and Service Sciences, 2012, 4 (14): : 125 - 131
  • [33] Generic and Update Multi-Document Text Summarization based on Genetic Algorithm
    Neri-Mendoza, Veronica
    Ledeneva, Yulia
    Arnulfo Garcia-Hernandez, Rene
    Hernandez-Castaneda, Angel
    COMPUTACION Y SISTEMAS, 2023, 27 (01): : 269 - 279
  • [34] A New Memetic Algorithm for Multi-document Summarization Based on CHC Algorithm and Greedy Search
    Mendoza, Martha
    Cobos, Carlos
    Leon, Elizabeth
    Lozano, Manuel
    Rodriguez, Francisco
    Herrera-Viedma, Enrique
    HUMAN-INSPIRED COMPUTING AND ITS APPLICATIONS, PT I, 2014, 8856 : 125 - 138
  • [35] Multi-document summarization based on lexical chains
    Chen, YM
    Wang, XL
    Liu, BQ
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 1937 - 1942
  • [36] Unsupervised Multi-document Summarization with Holistic Inference
    Zhang, Haopeng
    Cho, Sangwoo
    Song, Kaiqiang
    Wang, Xiaoyang
    Wang, Hongwei
    Zhang, Jiawei
    Yu, Dong
    13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 123 - 133
  • [37] Automatic multi-document summarization for digital libraries
    Ou Shiyan
    Khoo, Christopher S. G.
    Goh, Dion H.
    PROCEEDINGS OF THE ASIA-PACIFIC CONFERENCE ON LIBRARY & INFORMATION EDUCATION & PRACTICE 2006: PREPARING INFORMATION PROFESSIONALS FOR LEADERSHIP IN THE NEW AGE, 2006, : 72 - +
  • [38] Multi-document summarization for terrorism information extraction
    Wang, Fu Lee
    Yang, Christopher C.
    Shi, Xiaodong
    INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2006, 3975 : 602 - 608
  • [39] Multi-document summarization using closed patterns
    Qiang, Ji-Peng
    Chen, Ping
    Ding, Wei
    Xie, Fei
    Wu, Xindong
    KNOWLEDGE-BASED SYSTEMS, 2016, 99 : 28 - 38
  • [40] Enhancing multi-document summarization using concepts
    Rao, Pattabhi R. K.
    Devi, S. Lalitha
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2018, 43 (02):