Image Captioning with Compositional Neural Module Networks

被引:0
|
作者
Tian, Junjiao [1 ]
Oh, Jean [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In image captioning where fluency is an important factor in evaluation, e.g., n-gram metrics, sequential models are commonly used; however, sequential models generally result in overgeneralized expressions that lack the details that may be present in an input image Inspired by the idea of the compositional neural module networks in the visual question answering task, we introduce a hierarchical framework for image captioning that explores both compositionality and sequentiality of natural language. Our algorithm learns to compose a detail-rich sentence by selectively attending to different modules corresponding to unique aspects of each object detected in an input image to include specific descriptions such as counts and color. In a set of experiments on the MSCOCO dataset, the proposed model outperforms a state-of-the art model across multiple evaluation metrics, more importantly, presenting visually interpretable results. Furthermore, the breakdown of subcategories f-scores of the SPICE metric and human evaluation on Amazon Mechanical Turk show that our compositional module networks effectively generate accurate and detailed captions.
引用
收藏
页码:3576 / 3584
页数:9
相关论文
共 50 条
  • [41] Effective Pre-Training Method and Its Compositional Intelligence for Image Captioning
    Choi, Won-Hyuk
    Choi, Yong-Suk
    SENSORS, 2022, 22 (09)
  • [42] Interactive Dual Generative Adversarial Networks for Image Captioning
    Liu, Junhao
    Wang, Kai
    Xu, Chunpu
    Zhao, Zhou
    Xu, Ruifeng
    Shen, Ying
    Yang, Min
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11588 - 11595
  • [43] Semantic Representations With Attention Networks for Boosting Image Captioning
    Hafeth, Deema Abdal
    Kollias, Stefanos
    Ghafoor, Mubeen
    IEEE ACCESS, 2023, 11 : 40230 - 40239
  • [44] Semantic-Conditional Diffusion Networks for Image Captioning
    Luo, Jianjie
    Li, Yehao
    Pan, Yingwei
    Yao, Ting
    Feng, Jianlin
    Chao, Hongyang
    Mei, Tao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23359 - 23368
  • [45] Image Captioning using Adversarial Networks and Reinforcement Learning
    Yan, Shiyang
    Wu, Fangyu
    Smith, Jeremy S.
    Lu, Wenjin
    Zhang, Bailing
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 248 - 253
  • [46] Action knowledge for video captioning with graph neural networks
    Hendria, Willy Fitra
    Velda, Vania
    Putra, Bahy Helmi Hartoyo
    Adzaka, Fikriansyah
    Jeong, Cheol
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (04) : 50 - 62
  • [47] Image captioning in Hindi language using transformer networks
    Mishra, Santosh Kumar
    Dhir, Rijul
    Saha, Sriparna
    Bhattacharyya, Pushpak
    Singh, Amit Kumar
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 92
  • [48] Introducing Concept And Syntax Transition Networks for Image Captioning
    Blandfort, Philipp
    Karayil, Tushar
    Borth, Damian
    Dengel, Andreas
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 385 - 388
  • [49] Image filtering module using neural networks for an optical character recognizing system
    Holeva, Lee F.
    Kadaba, Nagesh
    Mathematical Modelling and Scientific Computing, 1993, 2 (sectiob):
  • [50] HAM: Hybrid attention module in deep convolutional neural networks for image classification
    Li, Guoqiang
    Fang, Qi
    Zha, Linlin
    Gao, Xin
    Zheng, Nenggan
    PATTERN RECOGNITION, 2022, 129