A Hierarchical Multimodal Attention-based Neural Network for Image Captioning

被引:16
|
作者
Cheng, Yong [1 ]
Huang, Fei [1 ]
Zhou, Lian [1 ]
Jin, Cheng [1 ]
Zhang, Yuejie [1 ]
Zhang, Tao [2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Univ Finance & Econ, Sch Informat Management & Engn, Shanghai, Peoples R China
关键词
Image Captioning; Multimodal Attention; Hierarchical Recurrent Neural Network; Long-Short Term Memory Model;
D O I
10.1145/3077136.3080671
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A novel hierarchical multimodal attention-based model is developed in this paper to generate more accurate and descriptive captions for images. Our model is an "end-to-end" neural network which contains three related sub-networks: a deep convolutional neural network to encode image contents, a recurrent neural network to identify the objects in images sequentially, and a multimodal attention-based recurrent neural network to generate image captions. The main contribution of our work is that the hierarchical structure and multimodal attention mechanism is both applied, thus each caption word can be generated with the multimodal attention on the intermediate semantic objects and the global visual content. Our experiments on two benchmark datasets have obtained very positive results.
引用
收藏
页码:889 / 892
页数:4
相关论文
共 50 条
  • [21] Research on Image Segmentation Algorithm Based on Multimodal Hierarchical Attention Mechanism and Genetic Neural Network
    Wang, Dalei
    Ma, Lan
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [22] Hashtag Recommendation with Attention-Based Neural Image Hashtagging Network
    Wu, Gaosheng
    Li, Yuhua
    Yan, Wenjin
    Li, Ruixuan
    Gu, Xiwu
    Yang, Qi
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 52 - 63
  • [23] Cross modification attention-based deliberation model for image captioning
    Lian, Zheng
    Zhang, Yanan
    Li, Haichang
    Wang, Rui
    Hu, Xiaohui
    APPLIED INTELLIGENCE, 2023, 53 (05) : 5910 - 5933
  • [24] A Hierarchical Attention-Based Neural Network Model for Socialbot Detection in OSN
    Fazil, Mohd
    Sah, Amit Kumar
    Abulaish, Muhammad
    2020 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2020), 2020, : 954 - 959
  • [25] Auxiliary feature extractor and dual attention-based image captioning
    Qian Zhao
    Guichang Wu
    Signal, Image and Video Processing, 2024, 18 : 3615 - 3626
  • [26] Cross modification attention-based deliberation model for image captioning
    Zheng Lian
    Yanan Zhang
    Haichang Li
    Rui Wang
    Xiaohui Hu
    Applied Intelligence, 2023, 53 : 5910 - 5933
  • [27] Auxiliary feature extractor and dual attention-based image captioning
    Zhao, Qian
    Wu, Guichang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3615 - 3626
  • [28] MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING
    Munusamy, Hemalatha
    Sekhar, Chandra C.
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 475 - 479
  • [29] A Hierarchical Neural Attention-based Text Classifier
    Sinha, Koustuv
    Dong, Yue
    Cheung, Jackie C. K.
    Ruths, Derek
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 817 - 823
  • [30] Attention-based Hierarchical Neural Query Suggestion
    Chen, Wanyu
    Cai, Fei
    Chen, Honghui
    de Rijke, Maarten
    ACM/SIGIR PROCEEDINGS 2018, 2018, : 1093 - 1096