Improved image captioning with subword units training and transformer

被引:0
|
作者
Cai Q. [1 ,2 ,3 ]
Li J. [1 ,2 ,3 ]
Li H. [1 ,2 ,3 ]
Zuo M. [1 ,2 ,3 ]
机构
[1] School of Computer and Information Engineering, Beijing Techology and Business University, Beijing
[2] Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing
[3] National Engineering Laboratory for Agri-Product Quality Traceability, Beijing
基金
中国国家自然科学基金;
关键词
Byte pair encoding (BPE); Image captioning; Reinforcement learning; Transformer;
D O I
10.3772/j.issn.1006-6748.2020.02.011
中图分类号
学科分类号
摘要
Image captioning models typically operate with a fixed vocabulary, but captioning is an open-vocabulary problem. Existing work addresses the image captioning of out-of-vocabulary words by labeling it as unknown in a dictionary. In addition, recurrent neural network (RNN) and its variants used in the caption task have become a bottleneck for their generation quality and training time cost. To address these 2 essential problems, a simpler but more effective approach is proposed for generating open-vocabulary caption, long short-term memory (LSTM) unit is replaced with transformer as decoder for better caption quality and less training time. The effectiveness of different word segmentation vocabulary and generation improvement of transformer over LSTM is discussed and it is proved that the improved models achieve state-of-the-art performance for the MSCOCO2014 image captioning tasks over a back-off dictionary baseline model. Copyright © by HIGH TECHNOLOGY LETTERS PRESS.
引用
收藏
页码:211 / 216
页数:5
相关论文
共 50 条
  • [21] Context-aware transformer for image captioning
    Yang, Xin
    Wang, Ying
    Chen, Haishun
    Li, Jie
    Huang, Tingting
    NEUROCOMPUTING, 2023, 549
  • [22] A Position-Aware Transformer for Image Captioning
    Deng, Zelin
    Zhou, Bo
    He, Pei
    Huang, Jianfeng
    Alfarraj, Osama
    Tolba, Amr
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 2065 - 2081
  • [23] Full-Memory Transformer for Image Captioning
    Lu, Tongwei
    Wang, Jiarong
    Min, Fen
    SYMMETRY-BASEL, 2023, 15 (01):
  • [24] A position-aware transformer for image captioning
    Deng, Zelin
    Zhou, Bo
    He, Pei
    Huang, Jianfeng
    Alfarraj, Osama
    Tolba, Amr
    Deng, Zelin (zl_deng@sina.com), 2005, Tech Science Press (70): : 2005 - 2021
  • [25] Retrieval-Augmented Transformer for Image Captioning
    Sarto, Sara
    Cornia, Marcella
    Baraldi, Lorenzo
    Cucchiara, Rita
    19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 1 - 7
  • [26] Input enhanced asymmetric transformer for image captioning
    Chenhao Zhu
    Xia Ye
    Qiduo Lu
    Signal, Image and Video Processing, 2023, 17 : 1419 - 1427
  • [27] Dual Global Enhanced Transformer for image captioning
    Xian, Tiantao
    Li, Zhixin
    Zhang, Canlong
    Ma, Huifang
    NEURAL NETWORKS, 2022, 148 : 129 - 141
  • [28] Attention-Aligned Transformer for Image Captioning
    Fei, Zhengcong
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 607 - 615
  • [29] Context-assisted Transformer for Image Captioning
    Lian Z.
    Wang R.
    Li H.-C.
    Yao H.
    Hu X.-H.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (09): : 1889 - 1903
  • [30] Dual Position Relationship Transformer for Image Captioning
    Wang, Yaohan
    Qian, Wenhua
    Nie, Rencan
    Xu, Dan
    Cao, Jinde
    Kim, Pyoungwon
    BIG DATA, 2022, 10 (06) : 515 - 527