Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

被引:0
|
作者
Pasunuru, Ramakanth [1 ]
Celikyilmaz, Asli [2 ]
Galley, Michel [2 ]
Xiong, Chenyan [2 ]
Zhang, Yizhe [2 ]
Bansal, Mohit [1 ]
Gao, Jianfeng [2 ]
机构
[1] Univ N Carolina, Chapel Hill, NC 27599 USA
[2] Microsoft Res, Redmond, WA USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient large-scale high-quality training datasets. We present two QMDS training datasets, which we construct using two data augmentation methods: (1) transferring the commonly used single-document CNN/Daily Mail summarization dataset to create the QMDSCNN dataset, and (2) mining search-query logs to create the QMDSIR dataset. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. To cover both these real summary and query aspects, we build abstractive end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets. We also introduce new hierarchical encoders that enable a more efficient encoding of the query together with multiple documents. Empirical results demonstrate that our data augmentation and encoding methods outperform baseline models on automatic metrics, as well as on human evaluations along multiple attributes.
引用
收藏
页码:13666 / 13674
页数:9
相关论文
共 50 条
  • [31] Query-focused multi-document summarization using co-training based semi-supervised learning
    Hu, Po
    Ji, Donghong
    Wang, Hai
    Teng, Chong
    PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009, 1 : 190 - 199
  • [32] Mutually Reinforced Manifold-Ranking Based Relevance Propagation Model for Query-Focused Multi-Document Summarization
    Cai, Xiaoyan
    Li, Wenjie
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1597 - 1607
  • [33] Unsupervised Query-Focused Multi-document Summarization Using uSIF Sentence Embedding Model and Maximal Marginal Relevance Criterion
    Lamsiyah, Salima
    El Mahdaouy, Abdelkader
    Espinasse, Bernard
    El Alaoui, Said Ouatik
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 798 - 808
  • [34] Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization
    Li, Miao
    Qi, Jianzhong
    Lau, Jey Han
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13085 - 13093
  • [35] Entity-Aware Abstractive Multi-Document Summarization
    Zhou, Hao
    Ren, Weidong
    Liu, Gongshen
    Su, Bo
    Lu, Wei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 351 - 362
  • [36] Topic-Guided Abstractive Multi-Document Summarization
    Cui, Peng
    Hu, Le
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1463 - 1472
  • [37] Bayesian Query-Focused Summarization
    Daume, Hal, III
    Marcu, Daniel
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 305 - 312
  • [38] Dual pattern-enhanced representations model for query-focused multi-document summarisation
    Wu, Yutong
    Li, Yuefeng
    Xu, Yue
    KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 736 - 748
  • [39] Abstractive Multi-Document Summarization via Joint Learning with Single-Document Summarization
    Jin, Hanqi
    Wan, Xiaojun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2545 - 2554
  • [40] QTSUMM: Query-Focused Summarization over Tabular Data
    Zhao, Yilun
    Qi, Zhenting
    Nan, Linyong
    Mi, Boyu
    Liu, Yixin
    Zou, Weijin
    Han, Simeng
    Chen, Ruizhe
    Tang, Xiangru
    Xu, Yumo
    Radev, Dragomir
    Cohan, Arman
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1157 - 1172