Image Captioning with Compositional Neural Module Networks

被引:0
|
作者
Tian, Junjiao [1 ]
Oh, Jean [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In image captioning where fluency is an important factor in evaluation, e.g., n-gram metrics, sequential models are commonly used; however, sequential models generally result in overgeneralized expressions that lack the details that may be present in an input image Inspired by the idea of the compositional neural module networks in the visual question answering task, we introduce a hierarchical framework for image captioning that explores both compositionality and sequentiality of natural language. Our algorithm learns to compose a detail-rich sentence by selectively attending to different modules corresponding to unique aspects of each object detected in an input image to include specific descriptions such as counts and color. In a set of experiments on the MSCOCO dataset, the proposed model outperforms a state-of-the art model across multiple evaluation metrics, more importantly, presenting visually interpretable results. Furthermore, the breakdown of subcategories f-scores of the SPICE metric and human evaluation on Amazon Mechanical Turk show that our compositional module networks effectively generate accurate and detailed captions.
引用
收藏
页码:3576 / 3584
页数:9
相关论文
共 50 条
  • [1] A Neural Compositional Paradigm for Image Captioning
    Dai, Bo
    Fidler, Sanja
    Lin, Dahua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [2] Survey of convolutional neural networks for image captioning
    Kalra, Saloni
    Leekha, Alka
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (01): : 239 - 260
  • [3] Image Captioning using Convolutional Neural Networks and Recurrent Neural Network
    Calvin, Rachel
    Suresh, Shravya
    2021 6TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2021,
  • [4] Semantic Compositional Networks for Visual Captioning
    Gan, Zhe
    Gan, Chuang
    He, Xiaodong
    Pu, Yunchen
    Tran, Kenneth
    Gao, Jianfeng
    Carin, Lawrence
    Deng, Li
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1141 - 1150
  • [5] Image Captioning for Video Surveillance System using Neural Networks
    Nivedita, M.
    Chandrashekar, Priyanka
    Mahapatra, Shibani
    Phamila, Y. Asnath Victy
    Selvaperumal, Sathish Kumar
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2021, 21 (04)
  • [6] Paragraph Image Captioning with Deep Fully Convolutional Neural Networks
    Li R.-F.
    Liang H.-Y.
    Feng F.-X.
    Zhang G.-W.
    Wang X.-J.
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2019, 42 (06): : 155 - 161
  • [7] Recurrent Neural Networks for Image Captioning: A Case Study with LSTM
    Mohite, Shailaja Sanjay
    Suganthini, C.
    Arunarani, A. R.
    Devi, K. Lalitha
    Sharma, Manish
    Patil, R. N.
    Shrivastava, Anurag
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (03) : 1082 - 1092
  • [8] The Role of Syntactic Planning in Compositional Image Captioning
    Bugliarello, Emanuele
    Elliott, Desmond
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 593 - 607
  • [9] Toward Backdoor Attacks for Image Captioning Model in Deep Neural Networks
    Kwon, Hyun
    Lee, Sanghyun
    SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [10] Compositional models for VQA: Can neural module networks really count?
    Sejnova, Gabriela
    Tesar, Michael
    Vavrecka, Michal
    POSTPROCEEDINGS OF THE 9TH ANNUAL INTERNATIONAL CONFERENCE ON BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES (BICA 2018), 2018, 145 : 481 - 487