Adapting Generative Pre-trained Language Model for Open-domain Multimodal Sentence Summarization

被引：6

作者：

Lin, Dengtian ^{[1
]}

Jing, Liqiang ^{[1
]}

Song, Xuemeng ^{[1
]}

Liu, Meng ^{[2
]}

Sun, Teng ^{[1
]}

Nie, Liqiang ^{[3
]}

机构：

[1] Shandong Univ, Jinan, Peoples R China

[2] Shandong Jianzhu Univ, Jinan, Peoples R China

[3] Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Multimodal Summarization; Pre-trained Language Model; Prompt Learning;

D O I：

10.1145/3539618.3591633

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multimodal sentence summarization, aiming to generate a brief summary of the source sentence and image, is a new yet challenging task. Although existing methods have achieved compelling success, they still suffer from two key limitations: 1) lacking the adaptation of generative pre-trained language models for open-domain MMSS, and 2) lacking the explicit critical information modeling. To address these limitations, we propose a BART-MMSS framework, where BART is adopted as the backbone. To be specific, we propose a prompt-guided image encoding module to extract the source image feature. It leverages several soft to-be-learned prompts for image patch embedding, which facilitates the visual content injection to BART for open-domain MMSS tasks. Thereafter, we devise an explicit source critical token learning module to directly capture the critical tokens of the source sentence with the reference of the source image, where we incorporate explicit supervision to improve performance. Extensive experiments on a public dataset fully validate the superiority of our proposed method. In addition, the predicted tokens by the vision-guided key-token highlighting module can be easily understood by humans and hence improve the interpretability of our model.

引用

页码：195 / 204

页数：10

共 50 条

[31] Enhancing Code Summarization with Graph Embedding and Pre-trained Model
Li, Lixuan
Li, Jie
Xu, Yihui
Zhu, Hao
Zhang, Xiaofang
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (11N12) : 1765 - 1786
[32] DIALOGLM: Pre-trained Model for Long Dialogue Understanding and Summarization
Zhong, Ming
Liu, Yang
Xu, Yichong
Zhu, Chenguang
Zeng, Michael
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11765 - 11773
[33] Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models
Huang, James Y.
Huang, Kuan-Hao
Chang, Kai-Wei
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1372 - 1379
[34] Z-Code plus plus : A Pre-trained Language Model Optimized for Abstractive Summarization
He, Pengcheng
Peng, Baolin
Wang, Song
Liu, Yang
Xu, Ruochen
Awadalla, Hany Hassan
Shi, Yu
Zhu, Chenguang
Xiong, Wayne
Zeng, Michael
Gao, Jianfeng
Huang, Xuedong
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 5095 - 5112
[35] A Pre-trained Language Model for Medical Question Answering Based on Domain Adaption
Liu, Lang
Ren, Junxiang
Wu, Yuejiao
Song, Ruilin
Cheng, Zhen
Wang, Sibo
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 216 - 227
[36] Adder Encoder for Pre-trained Language Model
Ding, Jianbang
Zhang, Suiyun
Li, Linlin
CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
[37] A Sentence Quality Evaluation Framework for Machine Reading Comprehension Incorporating Pre-trained Language Model
Meng, Fan-Jun
He, Ji-Fei
Xu, Xing-Jian
Zhao, Ya-Juan
Sun, Li-Jun
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 443 - 455
[38] Hybrid multi-document summarization using pre-trained language models
Ghadimi, Alireza
Beigy, Hamid
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 192
[39] Somun: entity-centric summarization incorporating pre-trained language models
Inan, Emrah
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 5301 - 5311
[40] Using Pre-Trained Language Models for Abstractive DBPEDIA Summarization: A Comparative Study
Zahera, Hamada M.
Vitiugin, Fedor
Sherif, Mohamed Ahmed
Castillo, Carlos
Ngomo, Axel-Cyrille Ngonga
KNOWLEDGE GRAPHS: SEMANTICS, MACHINE LEARNING, AND LANGUAGES, 2023, 56 : 19 - 37

← 1 2 3 4 5 →