Adapting Generative Pre-trained Language Model for Open-domain Multimodal Sentence Summarization

被引:6
|
作者
Lin, Dengtian [1 ]
Jing, Liqiang [1 ]
Song, Xuemeng [1 ]
Liu, Meng [2 ]
Sun, Teng [1 ]
Nie, Liqiang [3 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
[2] Shandong Jianzhu Univ, Jinan, Peoples R China
[3] Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Summarization; Pre-trained Language Model; Prompt Learning;
D O I
10.1145/3539618.3591633
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal sentence summarization, aiming to generate a brief summary of the source sentence and image, is a new yet challenging task. Although existing methods have achieved compelling success, they still suffer from two key limitations: 1) lacking the adaptation of generative pre-trained language models for open-domain MMSS, and 2) lacking the explicit critical information modeling. To address these limitations, we propose a BART-MMSS framework, where BART is adopted as the backbone. To be specific, we propose a prompt-guided image encoding module to extract the source image feature. It leverages several soft to-be-learned prompts for image patch embedding, which facilitates the visual content injection to BART for open-domain MMSS tasks. Thereafter, we devise an explicit source critical token learning module to directly capture the critical tokens of the source sentence with the reference of the source image, where we incorporate explicit supervision to improve performance. Extensive experiments on a public dataset fully validate the superiority of our proposed method. In addition, the predicted tokens by the vision-guided key-token highlighting module can be easily understood by humans and hence improve the interpretability of our model.
引用
收藏
页码:195 / 204
页数:10
相关论文
共 50 条
  • [1] Vision Enhanced Generative Pre-trained Language Model for Multimodal Sentence Summarization
    Jing, Liqiang
    Li, Yiren
    Xu, Junhao
    Yu, Yongcan
    Shen, Pei
    Song, Xuemeng
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 289 - 298
  • [2] Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
    Yu, Tiezheng
    Dai, Wenliang
    Liu, Zihan
    Fung, Pascale
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3995 - 4007
  • [3] Biomedical-domain pre-trained language model for extractive summarization
    Du, Yongping
    Li, Qingxiao
    Wang, Lulin
    He, Yanqing
    KNOWLEDGE-BASED SYSTEMS, 2020, 199 (199)
  • [4] Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
    Chen, Wenhu
    Verga, Pat
    de Jong, Michiel
    Wieting, John
    Cohen, William W.
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1597 - 1610
  • [5] Pre-trained language models with domain knowledge for biomedical extractive summarization
    Xie Q.
    Bishop J.A.
    Tiwari P.
    Ananiadou S.
    Knowledge-Based Systems, 2022, 252
  • [6] Knowledge Enhanced Pre-trained Language Model for Product Summarization
    Yin, Wenbo
    Ren, Junxiang
    Wu, Yuejiao
    Song, Ruilin
    Liu, Lang
    Cheng, Zhen
    Wang, Sibo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 263 - 273
  • [7] Adapting Pre-trained Generative Model to Medical Image for Data Augmentation
    Yuan, Zhouhang
    Fang, Zhengqing
    Huang, Zhengxing
    Wu, Fei
    Yao, Yu-Feng
    Li, Yingming
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 79 - 89
  • [8] SiBert: Enhanced Chinese Pre-trained Language Model with Sentence Insertion
    Chen, Jiahao
    Cao, Chenjie
    Jiang, Xiuyan
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2405 - 2412
  • [9] Evaluating the Summarization Comprehension of Pre-Trained Language Models
    Chernyshev, D. I.
    Dobrov, B. V.
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2023, 44 (08) : 3028 - 3039
  • [10] Evaluating the Summarization Comprehension of Pre-Trained Language Models
    D. I. Chernyshev
    B. V. Dobrov
    Lobachevskii Journal of Mathematics, 2023, 44 : 3028 - 3039