Adapting Generative Pre-trained Language Model for Open-domain Multimodal Sentence Summarization

被引:6
|
作者
Lin, Dengtian [1 ]
Jing, Liqiang [1 ]
Song, Xuemeng [1 ]
Liu, Meng [2 ]
Sun, Teng [1 ]
Nie, Liqiang [3 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
[2] Shandong Jianzhu Univ, Jinan, Peoples R China
[3] Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Summarization; Pre-trained Language Model; Prompt Learning;
D O I
10.1145/3539618.3591633
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal sentence summarization, aiming to generate a brief summary of the source sentence and image, is a new yet challenging task. Although existing methods have achieved compelling success, they still suffer from two key limitations: 1) lacking the adaptation of generative pre-trained language models for open-domain MMSS, and 2) lacking the explicit critical information modeling. To address these limitations, we propose a BART-MMSS framework, where BART is adopted as the backbone. To be specific, we propose a prompt-guided image encoding module to extract the source image feature. It leverages several soft to-be-learned prompts for image patch embedding, which facilitates the visual content injection to BART for open-domain MMSS tasks. Thereafter, we devise an explicit source critical token learning module to directly capture the critical tokens of the source sentence with the reference of the source image, where we incorporate explicit supervision to improve performance. Extensive experiments on a public dataset fully validate the superiority of our proposed method. In addition, the predicted tokens by the vision-guided key-token highlighting module can be easily understood by humans and hence improve the interpretability of our model.
引用
收藏
页码:195 / 204
页数:10
相关论文
共 50 条
  • [31] Enhancing Code Summarization with Graph Embedding and Pre-trained Model
    Li, Lixuan
    Li, Jie
    Xu, Yihui
    Zhu, Hao
    Zhang, Xiaofang
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (11N12) : 1765 - 1786
  • [32] DIALOGLM: Pre-trained Model for Long Dialogue Understanding and Summarization
    Zhong, Ming
    Liu, Yang
    Xu, Yichong
    Zhu, Chenguang
    Zeng, Michael
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11765 - 11773
  • [33] Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models
    Huang, James Y.
    Huang, Kuan-Hao
    Chang, Kai-Wei
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1372 - 1379
  • [34] Z-Code plus plus : A Pre-trained Language Model Optimized for Abstractive Summarization
    He, Pengcheng
    Peng, Baolin
    Wang, Song
    Liu, Yang
    Xu, Ruochen
    Awadalla, Hany Hassan
    Shi, Yu
    Zhu, Chenguang
    Xiong, Wayne
    Zeng, Michael
    Gao, Jianfeng
    Huang, Xuedong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 5095 - 5112
  • [35] A Pre-trained Language Model for Medical Question Answering Based on Domain Adaption
    Liu, Lang
    Ren, Junxiang
    Wu, Yuejiao
    Song, Ruilin
    Cheng, Zhen
    Wang, Sibo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 216 - 227
  • [36] Adder Encoder for Pre-trained Language Model
    Ding, Jianbang
    Zhang, Suiyun
    Li, Linlin
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
  • [37] A Sentence Quality Evaluation Framework for Machine Reading Comprehension Incorporating Pre-trained Language Model
    Meng, Fan-Jun
    He, Ji-Fei
    Xu, Xing-Jian
    Zhao, Ya-Juan
    Sun, Li-Jun
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 443 - 455
  • [38] Hybrid multi-document summarization using pre-trained language models
    Ghadimi, Alireza
    Beigy, Hamid
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 192
  • [39] Somun: entity-centric summarization incorporating pre-trained language models
    Inan, Emrah
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 5301 - 5311
  • [40] Using Pre-Trained Language Models for Abstractive DBPEDIA Summarization: A Comparative Study
    Zahera, Hamada M.
    Vitiugin, Fedor
    Sherif, Mohamed Ahmed
    Castillo, Carlos
    Ngomo, Axel-Cyrille Ngonga
    KNOWLEDGE GRAPHS: SEMANTICS, MACHINE LEARNING, AND LANGUAGES, 2023, 56 : 19 - 37