Adapting Generative Pre-trained Language Model for Open-domain Multimodal Sentence Summarization

被引：6

作者：

Lin, Dengtian ^{[1
]}

Jing, Liqiang ^{[1
]}

Song, Xuemeng ^{[1
]}

Liu, Meng ^{[2
]}

Sun, Teng ^{[1
]}

Nie, Liqiang ^{[3
]}

机构：

[1] Shandong Univ, Jinan, Peoples R China

[2] Shandong Jianzhu Univ, Jinan, Peoples R China

[3] Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Multimodal Summarization; Pre-trained Language Model; Prompt Learning;

D O I：

10.1145/3539618.3591633

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multimodal sentence summarization, aiming to generate a brief summary of the source sentence and image, is a new yet challenging task. Although existing methods have achieved compelling success, they still suffer from two key limitations: 1) lacking the adaptation of generative pre-trained language models for open-domain MMSS, and 2) lacking the explicit critical information modeling. To address these limitations, we propose a BART-MMSS framework, where BART is adopted as the backbone. To be specific, we propose a prompt-guided image encoding module to extract the source image feature. It leverages several soft to-be-learned prompts for image patch embedding, which facilitates the visual content injection to BART for open-domain MMSS tasks. Thereafter, we devise an explicit source critical token learning module to directly capture the critical tokens of the source sentence with the reference of the source image, where we incorporate explicit supervision to improve performance. Extensive experiments on a public dataset fully validate the superiority of our proposed method. In addition, the predicted tokens by the vision-guided key-token highlighting module can be easily understood by humans and hence improve the interpretability of our model.

引用

页码：195 / 204

页数：10

共 50 条

[21] CPM: A large-scale generative Chinese Pre-trained language model
Zhang, Zhengyan
Han, Xu
Zhou, Hao
Ke, Pei
Gu, Yuxian
Ye, Deming
Qin, Yujia
Su, Yusheng
Ji, Haozhe
Guan, Jian
Qi, Fanchao
Wang, Xiaozhi
Zheng, Yanan
Zeng, Guoyang
Cao, Huanqi
Chen, Shengqi
Li, Daixuan
Sun, Zhenbo
Liu, Zhiyuan
Huang, Minlie
Han, Wentao
Tang, Jie
Li, Juanzi
Zhu, Xiaoyan
Sun, Maosong
AI OPEN, 2021, 2 : 93 - 99
[22] Multistage Fusion with Forget Gate for Multimodal Summarization in Open-Domain Videos
Liu, Nayu
Sun, Xian
Yul, Hongfeng
Zhangi, Wenkai
Xui, Guangluan
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1834 - 1845
[23] Hyperbolic Pre-Trained Language Model
Chen, Weize
Han, Xu
Lin, Yankai
He, Kaichen
Xie, Ruobing
Zhou, Jie
Liu, Zhiyuan
Sun, Maosong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
[24] Schema matching based on energy domain pre-trained language model
Pan Z.
Yang M.
Monti A.
Energy Informatics, 2023, 6 (Suppl 1)
[25] Adapting Pre-trained Language Models to Rumor Detection on Twitter
Slimi, Hamda
Bounhas, Ibrahim
Slimani, Yahya
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2021, 27 (10) : 1128 - 1148
[26] Compression of Generative Pre-trained Language Models via Quantization
Tao, Chaofan
Hou, Lu
Zhang, Wei
Shang, Lifeng
Jiang, Xin
Liu, Qun
Luo, Ping
Wong, Ngai
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4821 - 4836
[27] Structured Pruning for Efficient Generative Pre-trained Language Models
Tao, Chaofan
Hou, Lu
Bai, Haoli
Wei, Jiansheng
Jiang, Xin
Liu, Qun
Lu, Ping
Wong, Ngai
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 10880 - 10895
[28] OMPGPT: A Generative Pre-trained Transformer Model for OpenMP
Chen, Le
Bhattacharjee, Arijit
Ahmed, Nesreen
Hasabnis, Niranjan
Oren, Gal
Vo, Vy
Jannesari, Ali
EURO-PAR 2024: PARALLEL PROCESSING, PT I, EURO-PAR 2024, 2024, 14801 : 121 - 134
[29] Pre-trained Language Model Representations for Language Generation
Edunov, Sergey
Baevski, Alexei
Auli, Michael
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
[30] scMulan: A Multitask Generative Pre-Trained Language Model for Single-Cell Analysis
Bian, Haiyang
Chen, Yixin
Dong, Xiaomin
Li, Chen
Hao, Minsheng
Chen, Sijie
Hu, Jinyi
Sun, Maosong
Wei, Lei
Zhang, Xuegong
RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024, 2024, 14758 : 479 - 482

← 1 2 3 4 5 →