Temporal Convolutional and Recurrent Networks for Image Captioning

被引：0

作者：

Iskra, Natalia ^{[1
]}

Iskra, Vitaly ^{[2
]}

机构：

[1] Belarusian State Univ Informat & Radioelect, Minsk, BELARUS

[2] Omnigon Commun LLC, New York, NY USA

来源：

PATTERN RECOGNITION AND INFORMATION PROCESSING, PRIP 2019 | 2019年 / 1055卷

关键词：

Image captioning; Convolutional neural networks; Recurrent neural networks; Visual Genome; Dilated convolution; Weight normalization; Dropout; Adam optimization;

D O I：

10.1007/978-3-030-35430-5_21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently temporal convolutional networks have shown excellent qualities in sequence modeling tasks [1]. Taking this fact into account, in this paper we investigate the possibilities of replacing recurrent networks in architectures targeted specifically at image captioning. We evaluate the solution on Visual Genome dataset [2], which provides extensive set of labels and descriptions that thoroughly grounds visual concepts to natural language.

引用

页码：254 / 266

页数：13

共 50 条

[1] Image Captioning using Convolutional Neural Networks and Recurrent Neural Network
Calvin, Rachel
Suresh, Shravya
2021 6TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2021,
[2] Survey of convolutional neural networks for image captioning
Kalra, Saloni
Leekha, Alka
JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (01): : 239 - 260
[3] Convolutional Image Captioning
Aneja, Jyoti
Deshpande, Aditya
Schwing, Alexander G.
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5561 - 5570
[4] A local representation-enhanced recurrent convolutional network for image captioning
Xiaoyi Wang
Jun Huang
International Journal of Multimedia Information Retrieval, 2022, 11 : 149 - 157
[5] A local representation-enhanced recurrent convolutional network for image captioning
Wang, Xiaoyi
Huang, Jun
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (02) : 149 - 157
[6] Paragraph Image Captioning with Deep Fully Convolutional Neural Networks
Li R.-F.
Liang H.-Y.
Feng F.-X.
Zhang G.-W.
Wang X.-J.
Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2019, 42 (06): : 155 - 161
[7] Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
Chen, Jingwen
Pan, Yingwei
Li, Yehao
Yao, Ting
Chao, Hongyang
Mei, Tao
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8167 - 8174
[8] Recurrent Neural Networks for Image Captioning: A Case Study with LSTM
Mohite, Shailaja Sanjay
Suganthini, C.
Arunarani, A. R.
Devi, K. Lalitha
Sharma, Manish
Patil, R. N.
Shrivastava, Anurag
JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (03) : 1082 - 1092
[9] Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
Dong, Xinzhi
Long, Chengjiang
Xu, Wenju
Xiao, Chunxia
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2615 - 2624
[10] Noise Augmented Double-Stream Graph Convolutional Networks for Image Captioning
Wu, Lingxiang
Xu, Min
Sang, Lei
Yao, Ting
Mei, Tao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (08) : 3118 - 3127

← 1 2 3 4 5 →