A Hierarchical Multimodal Attention-based Neural Network for Image Captioning

被引:16
|
作者
Cheng, Yong [1 ]
Huang, Fei [1 ]
Zhou, Lian [1 ]
Jin, Cheng [1 ]
Zhang, Yuejie [1 ]
Zhang, Tao [2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Univ Finance & Econ, Sch Informat Management & Engn, Shanghai, Peoples R China
关键词
Image Captioning; Multimodal Attention; Hierarchical Recurrent Neural Network; Long-Short Term Memory Model;
D O I
10.1145/3077136.3080671
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A novel hierarchical multimodal attention-based model is developed in this paper to generate more accurate and descriptive captions for images. Our model is an "end-to-end" neural network which contains three related sub-networks: a deep convolutional neural network to encode image contents, a recurrent neural network to identify the objects in images sequentially, and a multimodal attention-based recurrent neural network to generate image captions. The main contribution of our work is that the hierarchical structure and multimodal attention mechanism is both applied, thus each caption word can be generated with the multimodal attention on the intermediate semantic objects and the global visual content. Our experiments on two benchmark datasets have obtained very positive results.
引用
收藏
页码:889 / 892
页数:4
相关论文
共 50 条
  • [41] Geospatial relation captioning for high-spatial-resolution images by using an attention-based neural network
    Chen, Jie
    Han, Yarong
    Wan, Li
    Zhou, Xing
    Deng, Min
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2019, 40 (16) : 6482 - 6498
  • [42] MULTIMODAL SEMANTIC ATTENTION NETWORK FOR VIDEO CAPTIONING
    Sun, Liang
    Li, Bing
    Yuan, Chunfeng
    Zha, Zhengjun
    Hu, Weiming
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1300 - 1305
  • [43] Attention Correctness in Neural Image Captioning
    Liu, Chenxi
    Mao, Junhua
    Sha, Fei
    Yuille, Alan
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4176 - 4182
  • [44] Fall Detection with Wearable Sensors: A Hierarchical Attention-based Convolutional Neural Network Approach
    Yu, Shuo
    Chai, Yidong
    Chen, Hsinchun
    Brown, Randall A.
    Sherman, Scott J.
    Nunamaker, Jay F. Jr Jr
    JOURNAL OF MANAGEMENT INFORMATION SYSTEMS, 2021, 38 (04) : 1095 - 1121
  • [45] Attention-based hierarchical denoised deep clustering network
    Dong, Yongfeng
    Wang, Ziqiu
    Du, Jiapeng
    Fang, Weidong
    Li, Linhao
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (01): : 441 - 459
  • [46] Attention-based hierarchical denoised deep clustering network
    Yongfeng Dong
    Ziqiu Wang
    Jiapeng Du
    Weidong Fang
    Linhao Li
    World Wide Web, 2023, 26 : 441 - 459
  • [47] A lightweight attention-based network for image dehazing
    Wei, Yunsong
    Li, Jiaqiang
    Wei, Rongkun
    Lin, Zuxiang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 7271 - 7284
  • [48] A hierarchical contextual attention-based network for sequential recommendation
    Cui, Qiang
    Wu, Shu
    Huang, Yan
    Wang, Liang
    NEUROCOMPUTING, 2019, 358 : 141 - 149
  • [49] Chinese Image Captioning via Fuzzy Attention-based DenseNet-BiLSTM
    Lu, Huimin
    Yang, Rui
    Deng, Zhenrong
    Zhang, Yonglin
    Gao, Guangwei
    Lan, Rushi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [50] Multimodal Brain Image Analysis and Survival Prediction Using Neuromorphic Attention-Based Neural Networks
    Han, Il Song
    BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES (BRAINLES 2020), PT I, 2021, 12658 : 194 - 206