Improved image captioning with subword units training and transformer

被引:0
|
作者
Cai Q. [1 ,2 ,3 ]
Li J. [1 ,2 ,3 ]
Li H. [1 ,2 ,3 ]
Zuo M. [1 ,2 ,3 ]
机构
[1] School of Computer and Information Engineering, Beijing Techology and Business University, Beijing
[2] Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing
[3] National Engineering Laboratory for Agri-Product Quality Traceability, Beijing
基金
中国国家自然科学基金;
关键词
Byte pair encoding (BPE); Image captioning; Reinforcement learning; Transformer;
D O I
10.3772/j.issn.1006-6748.2020.02.011
中图分类号
学科分类号
摘要
Image captioning models typically operate with a fixed vocabulary, but captioning is an open-vocabulary problem. Existing work addresses the image captioning of out-of-vocabulary words by labeling it as unknown in a dictionary. In addition, recurrent neural network (RNN) and its variants used in the caption task have become a bottleneck for their generation quality and training time cost. To address these 2 essential problems, a simpler but more effective approach is proposed for generating open-vocabulary caption, long short-term memory (LSTM) unit is replaced with transformer as decoder for better caption quality and less training time. The effectiveness of different word segmentation vocabulary and generation improvement of transformer over LSTM is discussed and it is proved that the improved models achieve state-of-the-art performance for the MSCOCO2014 image captioning tasks over a back-off dictionary baseline model. Copyright © by HIGH TECHNOLOGY LETTERS PRESS.
引用
收藏
页码:211 / 216
页数:5
相关论文
共 50 条
  • [41] Caption TLSTMs: combining transformer with LSTMs for image captioning
    Yan, Jie
    Xie, Yuxiang
    Luan, Xidao
    Guo, Yanming
    Gong, Quanzhi
    Feng, Suru
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (02) : 111 - 121
  • [42] Reinforcement Learning Transformer for Image Captioning Generation Model
    Yan, Zhaojie
    FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701
  • [43] Improving Stylized Image Captioning with Better Use of Transformer
    Tan, Yutong
    Lin, Zheng
    Liu, Huan
    Zuo, Fan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 347 - 358
  • [44] Graph Alignment Transformer for More Grounded Image Captioning
    Tian, Canwei
    Hu, Haiyang
    Li, Zhongjin
    2022 INTERNATIONAL CONFERENCE ON INDUSTRIAL IOT, BIG DATA AND SUPPLY CHAIN, IIOTBDSC, 2022, : 95 - 102
  • [45] Visual contextual relationship augmented transformer for image captioning
    Su, Qiang
    Hu, Junbo
    Li, Zhixin
    APPLIED INTELLIGENCE, 2024, 54 (06) : 4794 - 4813
  • [46] Spiking -Transformer Optimization on FPGA for Image Classification and Captioning
    Udeji, Uchechukwu Leo
    Margala, Martin
    SOUTHEASTCON 2024, 2024, : 1353 - 1357
  • [47] Dual-Level Collaborative Transformer for Image Captioning
    Luo, Yunpeng
    Ji, Jiayi
    Sun, Xiaoshuai
    Cao, Liujuan
    Wu, Yongjian
    Huang, Feiyue
    Lin, Chia-Wen
    Ji, Rongrong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2286 - 2293
  • [48] A Sparse Transformer-Based Approach for Image Captioning
    Lei, Zhou
    Zhou, Congcong
    Chen, Shengbo
    Huang, Yiyong
    Liu, Xianrui
    IEEE ACCESS, 2020, 8 : 213437 - 213446
  • [49] A Sparse Transformer-Based Approach for Image Captioning
    Lei, Zhou
    Zhou, Congcong
    Chen, Shengbo
    Huang, Yiyong
    Liu, Xianrui
    IEEE Access, 2020, 8 : 213437 - 213446
  • [50] Transformer with token attention and attribute prediction for image captioning
    Song, Lifei
    Wang, Ying
    Shi, Linsu
    Yu, Jiazhong
    Li, Fei
    Xiang, Shiming
    PATTERN RECOGNITION LETTERS, 2025, 188 : 74 - 80