Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

被引:0
|
作者
Pasunuru, Ramakanth [1 ]
Celikyilmaz, Asli [2 ]
Galley, Michel [2 ]
Xiong, Chenyan [2 ]
Zhang, Yizhe [2 ]
Bansal, Mohit [1 ]
Gao, Jianfeng [2 ]
机构
[1] Univ N Carolina, Chapel Hill, NC 27599 USA
[2] Microsoft Res, Redmond, WA USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient large-scale high-quality training datasets. We present two QMDS training datasets, which we construct using two data augmentation methods: (1) transferring the commonly used single-document CNN/Daily Mail summarization dataset to create the QMDSCNN dataset, and (2) mining search-query logs to create the QMDSIR dataset. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. To cover both these real summary and query aspects, we build abstractive end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets. We also introduce new hierarchical encoders that enable a more efficient encoding of the query together with multiple documents. Empirical results demonstrate that our data augmentation and encoding methods outperform baseline models on automatic metrics, as well as on human evaluations along multiple attributes.
引用
收藏
页码:13666 / 13674
页数:9
相关论文
共 50 条
  • [41] Double-Hypergraph based Sentence Ranking for Query-Focused Multi-Document Summarizaton
    Cai, Xiaoyan
    Han, Junwei
    Guo, Lei
    Yang, Libin
    2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE WORKSHOPS (WIW 2016), 2016, : 112 - 118
  • [42] Diffusion Language Model with Query-Document Relevance for Query-Focused Summarization
    Huang, Shaoyao
    Qin, Luozheng
    Cao, Ziqiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11020 - 11030
  • [43] Transforming Wikipedia Into Augmented Data for Query-Focused Summarization
    Zhu, Haichao
    Dong, Li
    Wei, Furu
    Qin, Bing
    Liu, Ting
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2357 - 2367
  • [44] Abstractive Multi-document Summarization Using Deep Learning Approaches
    Poornima, Murkute
    Pulipati, Venkateswara Rao
    Kumar, T. Sunil
    PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND COMMUNICATION SYSTEMS, ICACECS 2021, 2022, : 57 - 68
  • [45] Abstractive Multi-Document Summarization via Phrase Selection and Merging
    Bing, Lidong
    Li, Piji
    Liao, Yi
    Lam, Wai
    Guo, Weiwei
    Passonneau, Rebecca J.
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 1587 - 1597
  • [46] Domain Adaptation with Pre-trained Transformers for Query-Focused Abstractive Text Summarization
    Laskar, Md Tahmid Rahman
    Hoque, Enamul
    Huang, Jimmy Xiangji
    COMPUTATIONAL LINGUISTICS, 2022, 48 (02) : 279 - 320
  • [47] Abstractive Multi-Document Summarization Based on Semantic Link Network
    Li, Wei
    Zhuge, Hai
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (01) : 43 - 54
  • [48] Genetic Semantic Graph Approach for Multi-document Abstractive Summarization
    Khan, Atif
    Salim, Naomie
    Kumar, Yogan Jaya
    2015 FIFTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION PROCESSING AND COMMUNICATIONS (ICDIPC), 2015, : 173 - 181
  • [49] Abstractive Multi-Document Text Summarization Using a Genetic Algorithm
    Neri Mendoza, Veronica
    Ledeneva, Yulia
    Arnulfo Garcia-Hernandez, Rene
    PATTERN RECOGNITION, MCPR 2019, 2019, 11524 : 422 - 432
  • [50] MeanSum : A Neural Model for Unsupervised Multi-Document Abstractive Summarization
    Chu, Eric
    Liu, Peter J.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97