Temporal Convolutional and Recurrent Networks for Image Captioning

被引:0
|
作者
Iskra, Natalia [1 ]
Iskra, Vitaly [2 ]
机构
[1] Belarusian State Univ Informat & Radioelect, Minsk, BELARUS
[2] Omnigon Commun LLC, New York, NY USA
关键词
Image captioning; Convolutional neural networks; Recurrent neural networks; Visual Genome; Dilated convolution; Weight normalization; Dropout; Adam optimization;
D O I
10.1007/978-3-030-35430-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently temporal convolutional networks have shown excellent qualities in sequence modeling tasks [1]. Taking this fact into account, in this paper we investigate the possibilities of replacing recurrent networks in architectures targeted specifically at image captioning. We evaluate the solution on Visual Genome dataset [2], which provides extensive set of labels and descriptions that thoroughly grounds visual concepts to natural language.
引用
收藏
页码:254 / 266
页数:13
相关论文
共 50 条
  • [1] Image Captioning using Convolutional Neural Networks and Recurrent Neural Network
    Calvin, Rachel
    Suresh, Shravya
    2021 6TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2021,
  • [2] Survey of convolutional neural networks for image captioning
    Kalra, Saloni
    Leekha, Alka
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (01): : 239 - 260
  • [3] Convolutional Image Captioning
    Aneja, Jyoti
    Deshpande, Aditya
    Schwing, Alexander G.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5561 - 5570
  • [4] A local representation-enhanced recurrent convolutional network for image captioning
    Xiaoyi Wang
    Jun Huang
    International Journal of Multimedia Information Retrieval, 2022, 11 : 149 - 157
  • [5] A local representation-enhanced recurrent convolutional network for image captioning
    Wang, Xiaoyi
    Huang, Jun
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (02) : 149 - 157
  • [6] Paragraph Image Captioning with Deep Fully Convolutional Neural Networks
    Li R.-F.
    Liang H.-Y.
    Feng F.-X.
    Zhang G.-W.
    Wang X.-J.
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2019, 42 (06): : 155 - 161
  • [7] Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8167 - 8174
  • [8] Recurrent Neural Networks for Image Captioning: A Case Study with LSTM
    Mohite, Shailaja Sanjay
    Suganthini, C.
    Arunarani, A. R.
    Devi, K. Lalitha
    Sharma, Manish
    Patil, R. N.
    Shrivastava, Anurag
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (03) : 1082 - 1092
  • [9] Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
    Dong, Xinzhi
    Long, Chengjiang
    Xu, Wenju
    Xiao, Chunxia
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2615 - 2624
  • [10] Noise Augmented Double-Stream Graph Convolutional Networks for Image Captioning
    Wu, Lingxiang
    Xu, Min
    Sang, Lei
    Yao, Ting
    Mei, Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (08) : 3118 - 3127