Adapting Generative Pre-trained Language Model for Open-domain Multimodal Sentence Summarization

被引:6
|
作者
Lin, Dengtian [1 ]
Jing, Liqiang [1 ]
Song, Xuemeng [1 ]
Liu, Meng [2 ]
Sun, Teng [1 ]
Nie, Liqiang [3 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
[2] Shandong Jianzhu Univ, Jinan, Peoples R China
[3] Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Summarization; Pre-trained Language Model; Prompt Learning;
D O I
10.1145/3539618.3591633
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal sentence summarization, aiming to generate a brief summary of the source sentence and image, is a new yet challenging task. Although existing methods have achieved compelling success, they still suffer from two key limitations: 1) lacking the adaptation of generative pre-trained language models for open-domain MMSS, and 2) lacking the explicit critical information modeling. To address these limitations, we propose a BART-MMSS framework, where BART is adopted as the backbone. To be specific, we propose a prompt-guided image encoding module to extract the source image feature. It leverages several soft to-be-learned prompts for image patch embedding, which facilitates the visual content injection to BART for open-domain MMSS tasks. Thereafter, we devise an explicit source critical token learning module to directly capture the critical tokens of the source sentence with the reference of the source image, where we incorporate explicit supervision to improve performance. Extensive experiments on a public dataset fully validate the superiority of our proposed method. In addition, the predicted tokens by the vision-guided key-token highlighting module can be easily understood by humans and hence improve the interpretability of our model.
引用
收藏
页码:195 / 204
页数:10
相关论文
共 50 条
  • [21] CPM: A large-scale generative Chinese Pre-trained language model
    Zhang, Zhengyan
    Han, Xu
    Zhou, Hao
    Ke, Pei
    Gu, Yuxian
    Ye, Deming
    Qin, Yujia
    Su, Yusheng
    Ji, Haozhe
    Guan, Jian
    Qi, Fanchao
    Wang, Xiaozhi
    Zheng, Yanan
    Zeng, Guoyang
    Cao, Huanqi
    Chen, Shengqi
    Li, Daixuan
    Sun, Zhenbo
    Liu, Zhiyuan
    Huang, Minlie
    Han, Wentao
    Tang, Jie
    Li, Juanzi
    Zhu, Xiaoyan
    Sun, Maosong
    AI OPEN, 2021, 2 : 93 - 99
  • [22] Multistage Fusion with Forget Gate for Multimodal Summarization in Open-Domain Videos
    Liu, Nayu
    Sun, Xian
    Yul, Hongfeng
    Zhangi, Wenkai
    Xui, Guangluan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1834 - 1845
  • [23] Hyperbolic Pre-Trained Language Model
    Chen, Weize
    Han, Xu
    Lin, Yankai
    He, Kaichen
    Xie, Ruobing
    Zhou, Jie
    Liu, Zhiyuan
    Sun, Maosong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
  • [24] Schema matching based on energy domain pre-trained language model
    Pan Z.
    Yang M.
    Monti A.
    Energy Informatics, 2023, 6 (Suppl 1)
  • [25] Adapting Pre-trained Language Models to Rumor Detection on Twitter
    Slimi, Hamda
    Bounhas, Ibrahim
    Slimani, Yahya
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2021, 27 (10) : 1128 - 1148
  • [26] Compression of Generative Pre-trained Language Models via Quantization
    Tao, Chaofan
    Hou, Lu
    Zhang, Wei
    Shang, Lifeng
    Jiang, Xin
    Liu, Qun
    Luo, Ping
    Wong, Ngai
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4821 - 4836
  • [27] Structured Pruning for Efficient Generative Pre-trained Language Models
    Tao, Chaofan
    Hou, Lu
    Bai, Haoli
    Wei, Jiansheng
    Jiang, Xin
    Liu, Qun
    Lu, Ping
    Wong, Ngai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 10880 - 10895
  • [28] OMPGPT: A Generative Pre-trained Transformer Model for OpenMP
    Chen, Le
    Bhattacharjee, Arijit
    Ahmed, Nesreen
    Hasabnis, Niranjan
    Oren, Gal
    Vo, Vy
    Jannesari, Ali
    EURO-PAR 2024: PARALLEL PROCESSING, PT I, EURO-PAR 2024, 2024, 14801 : 121 - 134
  • [29] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [30] scMulan: A Multitask Generative Pre-Trained Language Model for Single-Cell Analysis
    Bian, Haiyang
    Chen, Yixin
    Dong, Xiaomin
    Li, Chen
    Hao, Minsheng
    Chen, Sijie
    Hu, Jinyi
    Sun, Maosong
    Wei, Lei
    Zhang, Xuegong
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024, 2024, 14758 : 479 - 482