Learning Double-Level Relationship Networks for image captioning

被引:8
|
作者
Wang, Changzhi [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200438, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Local-global relationship; Relationship network; Graph attention network; ATTENTION;
D O I
10.1016/j.ipm.2023.103288
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning aims to generate descriptive sentences to describe image main contents. Existing attention-based approaches mainly focus on the salient visual features in the image. However, ignoring the learning relationship between local features and global features may cause local features to lose the interaction with global concepts, generating impropriate or inaccurate relationship words/phrases in the sentences. To alleviate the above issue, in this work we propose the Double-Level Relationship Networks (DLRN) that novelly exploits the complementary local features and global features in the image, and enhances the relationship between features. Technically, DLRN builds two types of networks, separate relationship network and unified relationship embedding network. The former learns different hierarchies of visual relationship by performing graph attention for local-level relationship enhancement and pixel-level relationship enhancement respectively. The latter takes the global features as the guide to learn the local-global relationship between local regions and global concepts, and obtains the feature representation containing rich relationship information. Further, we devise an attention-based feature fusion module to fully utilize the contribution of different modalities. It effectively fuses the previously obtained relationship features and original region features. Extensive experiments on three typical datasets verify that our DLRN significantly outperforms several state-of-the-art baselines. More remarkably, DLRN achieves the competitive performance while maintaining notable model efficiency. The source code is available at the GitHub https://github.com/RunCode90/ImageCaptioning.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Noise Augmented Double-Stream Graph Convolutional Networks for Image Captioning
    Wu, Lingxiang
    Xu, Min
    Sang, Lei
    Yao, Ting
    Mei, Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (08) : 3118 - 3127
  • [22] Restricted double-level bayesian classification model
    Shi, Hong-Bo
    Wang, Zhi-Hai
    Huang, Hou-Kuan
    Li, Xiao-Jian
    Ruan Jian Xue Bao/Journal of Software, 2004, 15 (02): : 193 - 199
  • [23] The double-level derotation osteotomy of the knee joint
    Ferner, Felix
    Lutter, Christoph
    Perl, Mario
    Harrer, Joerg
    ZEITSCHRIFT FUR ORTHOPADIE UND UNFALLCHIRURGIE, 2024,
  • [24] Visual Relationship Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [25] IMAGE CAPTIONING WITH WORD LEVEL ATTENTION
    Fang, Fang
    Wang, Hanli
    Tang, Pengjie
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 1278 - 1282
  • [26] Noncontiguous double-level unstable spinal injuries
    Takami M.
    Okada M.
    Enyo Y.
    Iwasaki H.
    Yamada H.
    Yoshida M.
    European Journal of Orthopaedic Surgery & Traumatology, 2017, 27 (1) : 79 - 86
  • [27] Exploring Visual Relationship for Image Captioning
    Yao, Ting
    Pan, Yingwei
    Li, Yehao
    Mei, Tao
    COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 : 711 - 727
  • [28] Sagittal reconstruction of lumbosacral contiguous double-level spondylolytic spondylolisthesis: a comparison of double-level and single-level transforaminal lumbar interbody fusion
    Du, Chang-zhi
    Li, Song
    Xu, Liang
    Zhou, Qing-shuang
    Zhu, Ze-zhang
    Sun, Xu
    Qiu, Yong
    JOURNAL OF ORTHOPAEDIC SURGERY AND RESEARCH, 2019, 14 (1)
  • [29] Sagittal reconstruction of lumbosacral contiguous double-level spondylolytic spondylolisthesis: a comparison of double-level and single-level transforaminal lumbar interbody fusion
    Chang-zhi Du
    Song Li
    Liang Xu
    Qing-shuang Zhou
    Ze-zhang Zhu
    Xu Sun
    Yong Qiu
    Journal of Orthopaedic Surgery and Research, 14
  • [30] Deliberate Attention Networks for Image Captioning
    Gao, Lianli
    Fan, Kaixuan
    Song, Jingkuan
    Liu, Xianglong
    Xu, Xing
    Shen, Heng Tao
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8320 - 8327