Image Captioning with Compositional Neural Module Networks

被引:0
|
作者
Tian, Junjiao [1 ]
Oh, Jean [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In image captioning where fluency is an important factor in evaluation, e.g., n-gram metrics, sequential models are commonly used; however, sequential models generally result in overgeneralized expressions that lack the details that may be present in an input image Inspired by the idea of the compositional neural module networks in the visual question answering task, we introduce a hierarchical framework for image captioning that explores both compositionality and sequentiality of natural language. Our algorithm learns to compose a detail-rich sentence by selectively attending to different modules corresponding to unique aspects of each object detected in an input image to include specific descriptions such as counts and color. In a set of experiments on the MSCOCO dataset, the proposed model outperforms a state-of-the art model across multiple evaluation metrics, more importantly, presenting visually interpretable results. Furthermore, the breakdown of subcategories f-scores of the SPICE metric and human evaluation on Amazon Mechanical Turk show that our compositional module networks effectively generate accurate and detailed captions.
引用
收藏
页码:3576 / 3584
页数:9
相关论文
共 50 条
  • [21] Component based comparative analysis of each module in image captioning
    Choi, Seoung-Ho
    Jo, Seoung Yeon
    Jung, Sung Hoon
    ICT EXPRESS, 2021, 7 (01): : 121 - 125
  • [22] Image Captioning using Deep Neural Architectures
    Shah, Parth
    Bakrola, Vishvajit
    Pati, Supriya
    2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,
  • [23] Neural Symbolic Representation Learning for Image Captioning
    Wang, Xiaomei
    Ma, Lin
    Fu, Yanwei
    Xue, Xiangyang
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 312 - 321
  • [24] Hierarchical Deep Neural Network for Image Captioning
    Yuting Su
    Yuqian Li
    Ning Xu
    An-An Liu
    Neural Processing Letters, 2020, 52 : 1057 - 1067
  • [25] Evolutionary recurrent neural network for image captioning
    Wang, Hanzhang
    Wang, Hanli
    Xu, Kaisheng
    NEUROCOMPUTING, 2020, 401 : 249 - 256
  • [26] Neural Module Networks
    Andreas, Jacob
    Rohrbach, Marcus
    Darrell, Trevor
    Klein, Dan
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 39 - 48
  • [27] A Research on Image Captioning by Different Encoder Networks
    Chang, Jieh-Ren
    Ling, Tsung-Ta
    Li, Ting-Chun
    2020 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C 2020), 2021, : 68 - 71
  • [28] Contextual and selective attention networks for image captioning
    Wang, Jing
    Li, Yehao
    Pan, Yingwei
    Yao, Ting
    Tang, Jinhui
    Mei, Tao
    SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (12)
  • [29] Temporal Convolutional and Recurrent Networks for Image Captioning
    Iskra, Natalia
    Iskra, Vitaly
    PATTERN RECOGNITION AND INFORMATION PROCESSING, PRIP 2019, 2019, 1055 : 254 - 266
  • [30] Contextual and selective attention networks for image captioning
    Jing WANG
    Yehao LI
    Yingwei PAN
    Ting YAO
    Jinhui TANG
    Tao MEI
    ScienceChina(InformationSciences), 2022, 65 (12) : 142 - 156