Improved image captioning with subword units training and transformer

被引：0

作者：

Cai Q. ^{[1
,2
,3
]}

Li J. ^{[1
,2
,3
]}

Li H. ^{[1
,2
,3
]}

Zuo M. ^{[1
,2
,3
]}

机构：

[1] School of Computer and Information Engineering, Beijing Techology and Business University, Beijing

[2] Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing

[3] National Engineering Laboratory for Agri-Product Quality Traceability, Beijing

来源：

High Technology Letters | 2020年 / 26卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Byte pair encoding (BPE); Image captioning; Reinforcement learning; Transformer;

D O I：

10.3772/j.issn.1006-6748.2020.02.011

中图分类号：

学科分类号：

摘要：

Image captioning models typically operate with a fixed vocabulary, but captioning is an open-vocabulary problem. Existing work addresses the image captioning of out-of-vocabulary words by labeling it as unknown in a dictionary. In addition, recurrent neural network (RNN) and its variants used in the caption task have become a bottleneck for their generation quality and training time cost. To address these 2 essential problems, a simpler but more effective approach is proposed for generating open-vocabulary caption, long short-term memory (LSTM) unit is replaced with transformer as decoder for better caption quality and less training time. The effectiveness of different word segmentation vocabulary and generation improvement of transformer over LSTM is discussed and it is proved that the improved models achieve state-of-the-art performance for the MSCOCO2014 image captioning tasks over a back-off dictionary baseline model. Copyright © by HIGH TECHNOLOGY LETTERS PRESS.

引用

页码：211 / 216

页数：5

共 50 条

[41] Caption TLSTMs: combining transformer with LSTMs for image captioning
Yan, Jie
Xie, Yuxiang
Luan, Xidao
Guo, Yanming
Gong, Quanzhi
Feng, Suru
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (02) : 111 - 121
[42] Reinforcement Learning Transformer for Image Captioning Generation Model
Yan, Zhaojie
FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701
[43] Improving Stylized Image Captioning with Better Use of Transformer
Tan, Yutong
Lin, Zheng
Liu, Huan
Zuo, Fan
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 347 - 358
[44] Graph Alignment Transformer for More Grounded Image Captioning
Tian, Canwei
Hu, Haiyang
Li, Zhongjin
2022 INTERNATIONAL CONFERENCE ON INDUSTRIAL IOT, BIG DATA AND SUPPLY CHAIN, IIOTBDSC, 2022, : 95 - 102
[45] Visual contextual relationship augmented transformer for image captioning
Su, Qiang
Hu, Junbo
Li, Zhixin
APPLIED INTELLIGENCE, 2024, 54 (06) : 4794 - 4813
[46] Spiking -Transformer Optimization on FPGA for Image Classification and Captioning
Udeji, Uchechukwu Leo
Margala, Martin
SOUTHEASTCON 2024, 2024, : 1353 - 1357
[47] Dual-Level Collaborative Transformer for Image Captioning
Luo, Yunpeng
Ji, Jiayi
Sun, Xiaoshuai
Cao, Liujuan
Wu, Yongjian
Huang, Feiyue
Lin, Chia-Wen
Ji, Rongrong
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2286 - 2293
[48] A Sparse Transformer-Based Approach for Image Captioning
Lei, Zhou
Zhou, Congcong
Chen, Shengbo
Huang, Yiyong
Liu, Xianrui
IEEE ACCESS, 2020, 8 : 213437 - 213446
[49] A Sparse Transformer-Based Approach for Image Captioning
Lei, Zhou
Zhou, Congcong
Chen, Shengbo
Huang, Yiyong
Liu, Xianrui
IEEE Access, 2020, 8 : 213437 - 213446
[50] Transformer with token attention and attribute prediction for image captioning
Song, Lifei
Wang, Ying
Shi, Linsu
Yu, Jiazhong
Li, Fei
Xiang, Shiming
PATTERN RECOGNITION LETTERS, 2025, 188 : 74 - 80

← 1 2 3 4 5 →