Temporal Convolutional and Recurrent Networks for Image Captioning

被引:0
|
作者
Iskra, Natalia [1 ]
Iskra, Vitaly [2 ]
机构
[1] Belarusian State Univ Informat & Radioelect, Minsk, BELARUS
[2] Omnigon Commun LLC, New York, NY USA
关键词
Image captioning; Convolutional neural networks; Recurrent neural networks; Visual Genome; Dilated convolution; Weight normalization; Dropout; Adam optimization;
D O I
10.1007/978-3-030-35430-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently temporal convolutional networks have shown excellent qualities in sequence modeling tasks [1]. Taking this fact into account, in this paper we investigate the possibilities of replacing recurrent networks in architectures targeted specifically at image captioning. We evaluate the solution on Visual Genome dataset [2], which provides extensive set of labels and descriptions that thoroughly grounds visual concepts to natural language.
引用
收藏
页码:254 / 266
页数:13
相关论文
共 50 条
  • [21] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
    Chen, Long
    Zhang, Hanwang
    Xiao, Jun
    Nie, Liqiang
    Shao, Jian
    Liu, Wei
    Chua, Tat-Seng
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6298 - 6306
  • [22] Deliberate Attention Networks for Image Captioning
    Gao, Lianli
    Fan, Kaixuan
    Song, Jingkuan
    Liu, Xianglong
    Xu, Xing
    Shen, Heng Tao
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8320 - 8327
  • [23] Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition
    Kim, Taejun
    Nam, Juhan
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 437 - 441
  • [24] AUTOMATED AUDIO CAPTIONING WITH RECURRENT NEURAL NETWORKS
    Drossos, Konstantinos
    Adavanne, Sharath
    Virtanen, Tuomas
    2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, : 374 - 378
  • [25] Spatial-Temporal Attention for Image Captioning
    Zhou, Junwei
    Wang, Xi
    Han, Jizhong
    Hu, Songlin
    Gao, Hongchao
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [26] DenseCap: Fully Convolutional Localization Networks for Dense Captioning
    Johnson, Justin
    Karpathy, Andrej
    Fei-Fei, Li
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4565 - 4574
  • [27] Convolutional Recurrent Neural Networks: Learning Spatial Dependencies for Image Representation
    Zuo, Zhen
    Shuai, Bing
    Wang, Gang
    Liu, Xiao
    Wang, Xingxing
    Wang, Bing
    Chen, Yushi
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2015,
  • [28] Recurrent Relational Memory Network for Unsupervised Image Captioning
    Guo, Dan
    Wang, Yang
    Song, Peipei
    Wang, Meng
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 920 - 926
  • [29] Boosting convolutional image captioning with semantic content and visual relationship
    Bai, Cong
    Zheng, Anqi
    Huang, Yuan
    Pan, Xiang
    Chen, Nan
    DISPLAYS, 2021, 70
  • [30] FULLY CONVOLUTIONAL NETWORKS FOR MULTI-TEMPORAL SAR IMAGE CLASSIFICATION
    Mullissa, Adugna G.
    Persello, Claudio
    Tolpekin, Valentyn
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 6635 - 6638