Complementary Shifted Transformer for Image Captioning

被引:1
|
作者
Liu, Yanbo [1 ]
Yang, You [2 ]
Xiang, Ruoyu [1 ]
Ma, Jixin [1 ]
机构
[1] Chongqing Normal Univ, Sch Comp & Informat Sci, Chongqing 401331, Peoples R China
[2] Natl Ctr Appl Math Chongqing, Chongqing 401331, Peoples R China
关键词
Image captioning; Transformer; Positional encoding; Multi-branch self-attention; Spatial shift;
D O I
10.1007/s11063-023-11314-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-basedmodels have dominated many vision and language tasks, including image captioning. However, such models still suffer from the limitation of expressive ability and information loss during dimensionality reduction. In order to solve the above problems, this paper proposes a Complementary Shifted Transformer (CST) for image captioning. We first introduce a complementary Multi-branch Bi-positional encoding Self-Attention (MBSA) module. It utilizes both absolute and relative positional encoding to learn precise positional representations. Meanwhile, MBSA is equipped with Multi-Branch Architecture, which replicates multiple branches for each head. To improve the expressive ability of the model, we utilize the drop branch technique, which trains the branches in a complementary way. Furthermore, we propose a Spatial Shift Augmented module, which takes advantage of both low-level and high-level features to enhance visual features with fewer parameters. To validate our model, we conduct extensive experiments on the MSCOCO benchmark dataset. Compared to the state-of-the-art methods, the proposed CST achieves a competitive performance of 135.3% CIDEr (+0.2%) on the Karpathy split and 136.3% CIDEr (+0.9%) on the official online test server. In addition, we also evaluate the inference performance of our model on a novel object dataset. The source codes and trained models are publicly available at https://github.com/noonisy/CST.
引用
收藏
页码:8339 / 8363
页数:25
相关论文
共 50 条
  • [41] Dual-Level Collaborative Transformer for Image Captioning
    Luo, Yunpeng
    Ji, Jiayi
    Sun, Xiaoshuai
    Cao, Liujuan
    Wu, Yongjian
    Huang, Feiyue
    Lin, Chia-Wen
    Ji, Rongrong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2286 - 2293
  • [42] A Sparse Transformer-Based Approach for Image Captioning
    Lei, Zhou
    Zhou, Congcong
    Chen, Shengbo
    Huang, Yiyong
    Liu, Xianrui
    IEEE ACCESS, 2020, 8 : 213437 - 213446
  • [43] A Sparse Transformer-Based Approach for Image Captioning
    Lei, Zhou
    Zhou, Congcong
    Chen, Shengbo
    Huang, Yiyong
    Liu, Xianrui
    IEEE Access, 2020, 8 : 213437 - 213446
  • [44] Transformer with token attention and attribute prediction for image captioning
    Song, Lifei
    Wang, Ying
    Shi, Linsu
    Yu, Jiazhong
    Li, Fei
    Xiang, Shiming
    PATTERN RECOGNITION LETTERS, 2025, 188 : 74 - 80
  • [45] Improved image captioning with subword units training and transformer
    Cai Q.
    Li J.
    Li H.
    Zuo M.
    High Technology Letters, 2020, 26 (02) : 211 - 216
  • [46] Deconfounded fashion image captioning with transformer and multimodal retrieval
    Tao PENG
    Weiqiao YIN
    Junping LIU
    Li LI
    Xinrong HU
    虚拟现实与智能硬件(中英文), 2025, 7 (02) : 127 - 138
  • [47] ThaiTC:Thai Transformer-based Image Captioning
    Jaknamon, Teetouch
    Marukatat, Sanparith
    2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,
  • [48] Lightweight Transformer with GRU Integrated Decoder for Image Captioning
    Sharma, Dhruv
    Dingliwal, Rishabh
    Dhiman, Chhavi
    Kumar, Dinesh
    2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 434 - 438
  • [49] A Review of Transformer-Based Approaches for Image Captioning
    Ondeng, Oscar
    Ouma, Heywood
    Akuon, Peter
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [50] Image captioning in Hindi language using transformer networks
    Mishra, Santosh Kumar
    Dhir, Rijul
    Saha, Sriparna
    Bhattacharyya, Pushpak
    Singh, Amit Kumar
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 92