Improved image captioning with subword units training and transformer

被引:0
|
作者
Cai Q. [1 ,2 ,3 ]
Li J. [1 ,2 ,3 ]
Li H. [1 ,2 ,3 ]
Zuo M. [1 ,2 ,3 ]
机构
[1] School of Computer and Information Engineering, Beijing Techology and Business University, Beijing
[2] Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing
[3] National Engineering Laboratory for Agri-Product Quality Traceability, Beijing
基金
中国国家自然科学基金;
关键词
Byte pair encoding (BPE); Image captioning; Reinforcement learning; Transformer;
D O I
10.3772/j.issn.1006-6748.2020.02.011
中图分类号
学科分类号
摘要
Image captioning models typically operate with a fixed vocabulary, but captioning is an open-vocabulary problem. Existing work addresses the image captioning of out-of-vocabulary words by labeling it as unknown in a dictionary. In addition, recurrent neural network (RNN) and its variants used in the caption task have become a bottleneck for their generation quality and training time cost. To address these 2 essential problems, a simpler but more effective approach is proposed for generating open-vocabulary caption, long short-term memory (LSTM) unit is replaced with transformer as decoder for better caption quality and less training time. The effectiveness of different word segmentation vocabulary and generation improvement of transformer over LSTM is discussed and it is proved that the improved models achieve state-of-the-art performance for the MSCOCO2014 image captioning tasks over a back-off dictionary baseline model. Copyright © by HIGH TECHNOLOGY LETTERS PRESS.
引用
收藏
页码:211 / 216
页数:5
相关论文
共 50 条
  • [31] SPT: Spatial Pyramid Transformer for Image Captioning
    Zhang, Haonan
    Zeng, Pengpeng
    Gao, Lianli
    Lyu, Xinyu
    Song, Jingkuan
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4829 - 4842
  • [32] Position-guided transformer for image captioning
    Hu, Juntao
    Yang, You
    Yao, Lu
    An, Yongzhi
    Pan, Longyue
    IMAGE AND VISION COMPUTING, 2022, 128
  • [33] Input enhanced asymmetric transformer for image captioning
    Zhu, Chenhao
    Ye, Xia
    Lu, Qiduo
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1419 - 1427
  • [34] Semi-Autoregressive Transformer for Image Captioning
    Zhou, Yuanen
    Zhang, Yong
    Hu, Zhenzhen
    Wang, Meng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3132 - 3136
  • [35] HIST: Hierarchical and sequential transformer for image captioning
    Lv, Feixiao
    Wang, Rui
    Jing, Lihua
    Dai, Pengwen
    IET COMPUTER VISION, 2024, 18 (07) : 1043 - 1056
  • [36] Text to Image Synthesis for Improved Image Captioning
    Hossain, Md. Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    Bennamoun, Mohammed
    IEEE ACCESS, 2021, 9 : 64918 - 64928
  • [37] Training for Diversity in Image Paragraph Captioning
    Melas-Kyriazi, Luke
    Han, George
    Rush, Alexander M.
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 757 - 761
  • [38] Efficient Image Captioning Based on Vision Transformer Models
    Elbedwehy, Samar
    Medhat, T.
    Hamza, Taher
    Alrahmawy, Mohammed F.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (01): : 1483 - 1500
  • [39] External knowledge-assisted Transformer for image captioning
    Li, Zhixin
    Su, Qiang
    Chen, Tianyu
    IMAGE AND VISION COMPUTING, 2023, 140
  • [40] Dual-Spatial Normalized Transformer for image captioning
    Hu, Juntao
    Yang, You
    An, Yongzhi
    Yao, Lu
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123