Adapting Generative Pre-trained Language Model for Open-domain Multimodal Sentence Summarization

被引：6

作者：

Lin, Dengtian ^{[1
]}

Jing, Liqiang ^{[1
]}

Song, Xuemeng ^{[1
]}

Liu, Meng ^{[2
]}

Sun, Teng ^{[1
]}

Nie, Liqiang ^{[3
]}

机构：

[1] Shandong Univ, Jinan, Peoples R China

[2] Shandong Jianzhu Univ, Jinan, Peoples R China

[3] Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Multimodal Summarization; Pre-trained Language Model; Prompt Learning;

D O I：

10.1145/3539618.3591633

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multimodal sentence summarization, aiming to generate a brief summary of the source sentence and image, is a new yet challenging task. Although existing methods have achieved compelling success, they still suffer from two key limitations: 1) lacking the adaptation of generative pre-trained language models for open-domain MMSS, and 2) lacking the explicit critical information modeling. To address these limitations, we propose a BART-MMSS framework, where BART is adopted as the backbone. To be specific, we propose a prompt-guided image encoding module to extract the source image feature. It leverages several soft to-be-learned prompts for image patch embedding, which facilitates the visual content injection to BART for open-domain MMSS tasks. Thereafter, we devise an explicit source critical token learning module to directly capture the critical tokens of the source sentence with the reference of the source image, where we incorporate explicit supervision to improve performance. Extensive experiments on a public dataset fully validate the superiority of our proposed method. In addition, the predicted tokens by the vision-guided key-token highlighting module can be easily understood by humans and hence improve the interpretability of our model.

引用

页码：195 / 204

页数：10

共 50 条

[1] Vision Enhanced Generative Pre-trained Language Model for Multimodal Sentence Summarization
Jing, Liqiang
Li, Yiren
Xu, Junhao
Yu, Yongcan
Shen, Pei
Song, Xuemeng
MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 289 - 298
[2] Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Yu, Tiezheng
Dai, Wenliang
Liu, Zihan
Fung, Pascale
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3995 - 4007
[3] Biomedical-domain pre-trained language model for extractive summarization
Du, Yongping
Li, Qingxiao
Wang, Lulin
He, Yanqing
KNOWLEDGE-BASED SYSTEMS, 2020, 199 (199)
[4] Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
Chen, Wenhu
Verga, Pat
de Jong, Michiel
Wieting, John
Cohen, William W.
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1597 - 1610
[5] Pre-trained language models with domain knowledge for biomedical extractive summarization
Xie Q.
Bishop J.A.
Tiwari P.
Ananiadou S.
Knowledge-Based Systems, 2022, 252
[6] Knowledge Enhanced Pre-trained Language Model for Product Summarization
Yin, Wenbo
Ren, Junxiang
Wu, Yuejiao
Song, Ruilin
Liu, Lang
Cheng, Zhen
Wang, Sibo
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 263 - 273
[7] Adapting Pre-trained Generative Model to Medical Image for Data Augmentation
Yuan, Zhouhang
Fang, Zhengqing
Huang, Zhengxing
Wu, Fei
Yao, Yu-Feng
Li, Yingming
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 79 - 89
[8] SiBert: Enhanced Chinese Pre-trained Language Model with Sentence Insertion
Chen, Jiahao
Cao, Chenjie
Jiang, Xiuyan
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2405 - 2412
[9] Evaluating the Summarization Comprehension of Pre-Trained Language Models
Chernyshev, D. I.
Dobrov, B. V.
LOBACHEVSKII JOURNAL OF MATHEMATICS, 2023, 44 (08) : 3028 - 3039
[10] Evaluating the Summarization Comprehension of Pre-Trained Language Models
D. I. Chernyshev
B. V. Dobrov
Lobachevskii Journal of Mathematics, 2023, 44 : 3028 - 3039

← 1 2 3 4 5 →