Image Captioning with Compositional Neural Module Networks

被引：0

作者：

Tian, Junjiao ^{[1
]}

Oh, Jean ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In image captioning where fluency is an important factor in evaluation, e.g., n-gram metrics, sequential models are commonly used; however, sequential models generally result in overgeneralized expressions that lack the details that may be present in an input image Inspired by the idea of the compositional neural module networks in the visual question answering task, we introduce a hierarchical framework for image captioning that explores both compositionality and sequentiality of natural language. Our algorithm learns to compose a detail-rich sentence by selectively attending to different modules corresponding to unique aspects of each object detected in an input image to include specific descriptions such as counts and color. In a set of experiments on the MSCOCO dataset, the proposed model outperforms a state-of-the art model across multiple evaluation metrics, more importantly, presenting visually interpretable results. Furthermore, the breakdown of subcategories f-scores of the SPICE metric and human evaluation on Amazon Mechanical Turk show that our compositional module networks effectively generate accurate and detailed captions.

引用

页码：3576 / 3584

页数：9

共 50 条

[1] A Neural Compositional Paradigm for Image Captioning
Dai, Bo
Fidler, Sanja
Lin, Dahua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[2] Survey of convolutional neural networks for image captioning
Kalra, Saloni
Leekha, Alka
JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (01): : 239 - 260
[3] Image Captioning using Convolutional Neural Networks and Recurrent Neural Network
Calvin, Rachel
Suresh, Shravya
2021 6TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2021,
[4] Semantic Compositional Networks for Visual Captioning
Gan, Zhe
Gan, Chuang
He, Xiaodong
Pu, Yunchen
Tran, Kenneth
Gao, Jianfeng
Carin, Lawrence
Deng, Li
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1141 - 1150
[5] Image Captioning for Video Surveillance System using Neural Networks
Nivedita, M.
Chandrashekar, Priyanka
Mahapatra, Shibani
Phamila, Y. Asnath Victy
Selvaperumal, Sathish Kumar
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2021, 21 (04)
[6] Paragraph Image Captioning with Deep Fully Convolutional Neural Networks
Li R.-F.
Liang H.-Y.
Feng F.-X.
Zhang G.-W.
Wang X.-J.
Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2019, 42 (06): : 155 - 161
[7] Recurrent Neural Networks for Image Captioning: A Case Study with LSTM
Mohite, Shailaja Sanjay
Suganthini, C.
Arunarani, A. R.
Devi, K. Lalitha
Sharma, Manish
Patil, R. N.
Shrivastava, Anurag
JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (03) : 1082 - 1092
[8] The Role of Syntactic Planning in Compositional Image Captioning
Bugliarello, Emanuele
Elliott, Desmond
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 593 - 607
[9] Toward Backdoor Attacks for Image Captioning Model in Deep Neural Networks
Kwon, Hyun
Lee, Sanghyun
SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
[10] Compositional models for VQA: Can neural module networks really count?
Sejnova, Gabriela
Tesar, Michael
Vavrecka, Michal
POSTPROCEEDINGS OF THE 9TH ANNUAL INTERNATIONAL CONFERENCE ON BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES (BICA 2018), 2018, 145 : 481 - 487

← 1 2 3 4 5 →