Comparative Document Summarisation via Classification

被引:0
|
作者
Bista, Umanga [1 ,2 ]
Mathews, Alexander [1 ,2 ]
Shin, Minjeong [1 ,2 ]
Menon, Aditya Krishna [1 ,3 ]
Xie, Lexing [1 ,2 ]
机构
[1] Australian Natl Univ, Canberra, ACT, Australia
[2] Data Decis CRC, Canberra, ACT, Australia
[3] Google Res, Canberra, ACT, Australia
来源
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2019年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper considers extractive summarisation in a comparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summarisation, interpretable machine learning, and data subset selection. In particular, by casting the problem as a binary classification amongst different groups, we derive objectives based on the notion of maximum mean discrepancy, as well as a simple yet effective gradient-based optimisation strategy. Our new formulation allows scalable evaluations of comparative summarisation as a classification task, both automatically and via crowd-sourcing. To this end, we evaluate comparative summarisation methods on a newly curated collection of controversial news topics over 13 months. We observe that gradient-based optimisation outperforms discrete and baseline approaches in 15 out of 24 different automatic evaluation settings. In crowd-sourced evaluations, summaries from gradient optimisation elicit 7% more accurate classification from human workers than discrete optimisation. Our result contrasts with recent literature on submodular data subset selection that favours discrete optimisation. We posit that our formulation of comparative summarisation will prove useful in a diverse range of use cases such as comparing content sources, authors, related topics, or distinct view points.
引用
收藏
页码:20 / 28
页数:9
相关论文
共 50 条
  • [1] COMPARATIVE STUDY OF LONG DOCUMENT CLASSIFICATION
    Wagh, Vedangi
    Khandve, Snehal
    Joshi, Isha
    Wani, Apurva
    Kale, Geetanjali
    Joshi, Raviraj
    2021 IEEE REGION 10 CONFERENCE (TENCON 2021), 2021, : 732 - 737
  • [2] Exploring Clustering for Multi-document Arabic Summarisation
    El-Haj, Mahmoud
    Kruschwitz, Udo
    Fox, Chris
    INFORMATION RETRIEVAL TECHNOLOGY, 2011, 7097 : 550 - 561
  • [3] Identifying semantic equivalence for multi-document summarisation
    Eamonn Newman
    Joe Carthy
    John Dunnion
    Nicola Stokes
    Artificial Intelligence Review, 2006, 25 : 55 - 65
  • [4] The Exploration of Knowledge-Preserving Prompts for Document Summarisation
    Chen, Chen
    Zhang, Wei Emma
    Shakeri, Alireza Seyed
    Fiza, Makhmoor
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [5] Web document summarisation: a task-oriented evaluation
    White, R
    Ruthven, I
    Jose, JM
    12TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2001, : 951 - 955
  • [6] Comparative Summarisation of Rich Media Collections
    Bista, Umanga
    PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 812 - 813
  • [7] Identifying semantic equivalence for multi-document summarisation
    Newman, Eamonn
    Carthy, Joe
    Dunnion, John
    Stokes, Nicola
    ARTIFICIAL INTELLIGENCE REVIEW, 2006, 25 (1-2) : 55 - 65
  • [8] A comparative study of citations and links in document classification
    Couto, Thierson
    Cristo, Marco
    Goncalves, Marcos Andre
    Calado, Pavel
    Ziviani, Nivio
    Moura, Edleno
    Ribeiro-Neto, Berthier
    OPENING INFORMATION HORIZONS, 2006, : 75 - +
  • [9] Document Classification via Nonlinear Metric Learning
    Li, Xin
    Bai, Yanqin
    Zhou, Siyun
    Li, Ying
    NEURAL PROCESSING LETTERS, 2018, 48 (03) : 1335 - 1345
  • [10] Document Classification via Nonlinear Metric Learning
    Xin Li
    Yanqin Bai
    Siyun Zhou
    Ying Li
    Neural Processing Letters, 2018, 48 : 1335 - 1345