Improved image captioning with subword units training and transformer

被引:0
|
作者
Cai Q. [1 ,2 ,3 ]
Li J. [1 ,2 ,3 ]
Li H. [1 ,2 ,3 ]
Zuo M. [1 ,2 ,3 ]
机构
[1] School of Computer and Information Engineering, Beijing Techology and Business University, Beijing
[2] Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing
[3] National Engineering Laboratory for Agri-Product Quality Traceability, Beijing
基金
中国国家自然科学基金;
关键词
Byte pair encoding (BPE); Image captioning; Reinforcement learning; Transformer;
D O I
10.3772/j.issn.1006-6748.2020.02.011
中图分类号
学科分类号
摘要
Image captioning models typically operate with a fixed vocabulary, but captioning is an open-vocabulary problem. Existing work addresses the image captioning of out-of-vocabulary words by labeling it as unknown in a dictionary. In addition, recurrent neural network (RNN) and its variants used in the caption task have become a bottleneck for their generation quality and training time cost. To address these 2 essential problems, a simpler but more effective approach is proposed for generating open-vocabulary caption, long short-term memory (LSTM) unit is replaced with transformer as decoder for better caption quality and less training time. The effectiveness of different word segmentation vocabulary and generation improvement of transformer over LSTM is discussed and it is proved that the improved models achieve state-of-the-art performance for the MSCOCO2014 image captioning tasks over a back-off dictionary baseline model. Copyright © by HIGH TECHNOLOGY LETTERS PRESS.
引用
收藏
页码:211 / 216
页数:5
相关论文
共 50 条
  • [1] Improved image captioning with subword units training and transformer
    蔡强
    Li Jing
    Li Haisheng
    Zuo Min
    HighTechnologyLetters, 2020, 26 (02) : 211 - 216
  • [2] Image Captioning in Turkish with Subword Units
    Kuyu, Menekse
    Erdem, Aykut
    Erdem, Erkut
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [3] Improved Transformer with Parallel Encoders for Image Captioning
    Lou, Liangshan
    Lu, Ke
    Xue, Jian
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4072 - 4078
  • [4] Image Captioning Based on An Improved Transformer with IoU Position Encoding
    Li, Yazhou
    Shi, Yihui
    Liu, Yun
    Li, Ruifan
    Ma, Zhanyu
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 2066 - 2071
  • [5] SubICap: Towards Subword-informed Image Captioning
    Sharif, Naeha
    Bennamoun, Mohammed
    Liu, Wei
    Shah, Syed Afaq Ali
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3539 - 3548
  • [6] Distance Transformer for Image Captioning
    Wang, Jiarong
    Lu, Tongwei
    Liu, Xuanxuan
    Yang, Qi
    2021 4TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION ENGINEERING (RCAE 2021), 2021, : 73 - 76
  • [7] Rotary Transformer for Image Captioning
    Qiu, Yile
    Zhu, Li
    SECOND INTERNATIONAL CONFERENCE ON OPTICS AND IMAGE PROCESSING (ICOIP 2022), 2022, 12328
  • [8] Entangled Transformer for Image Captioning
    Li, Guang
    Zhu, Linchao
    Liu, Ping
    Yang, Yi
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8927 - 8936
  • [9] Boosted Transformer for Image Captioning
    Li, Jiangyun
    Yao, Peng
    Guo, Longteng
    Zhang, Weicun
    APPLIED SCIENCES-BASEL, 2019, 9 (16):
  • [10] Complementary Shifted Transformer for Image Captioning
    Liu, Yanbo
    Yang, You
    Xiang, Ruoyu
    Ma, Jixin
    NEURAL PROCESSING LETTERS, 2023, 55 (06) : 8339 - 8363