Learning Double-Level Relationship Networks for image captioning

被引:8
|
作者
Wang, Changzhi [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200438, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Local-global relationship; Relationship network; Graph attention network; ATTENTION;
D O I
10.1016/j.ipm.2023.103288
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning aims to generate descriptive sentences to describe image main contents. Existing attention-based approaches mainly focus on the salient visual features in the image. However, ignoring the learning relationship between local features and global features may cause local features to lose the interaction with global concepts, generating impropriate or inaccurate relationship words/phrases in the sentences. To alleviate the above issue, in this work we propose the Double-Level Relationship Networks (DLRN) that novelly exploits the complementary local features and global features in the image, and enhances the relationship between features. Technically, DLRN builds two types of networks, separate relationship network and unified relationship embedding network. The former learns different hierarchies of visual relationship by performing graph attention for local-level relationship enhancement and pixel-level relationship enhancement respectively. The latter takes the global features as the guide to learn the local-global relationship between local regions and global concepts, and obtains the feature representation containing rich relationship information. Further, we devise an attention-based feature fusion module to fully utilize the contribution of different modalities. It effectively fuses the previously obtained relationship features and original region features. Extensive experiments on three typical datasets verify that our DLRN significantly outperforms several state-of-the-art baselines. More remarkably, DLRN achieves the competitive performance while maintaining notable model efficiency. The source code is available at the GitHub https://github.com/RunCode90/ImageCaptioning.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Double-level emotion image retrieval model
    Wang, Shang-Fei
    Wang, Xu-Fa
    Xitong Fangzhen Xuebao / Journal of System Simulation, 2004, 16 (09):
  • [2] Double-level deep multi-view collaborative learning for image clustering
    Xiao, Liang
    Liu, Wenzhe
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (05)
  • [3] Learning joint relationship attention network for image captioning
    Wang, Changzhi
    Gu, Xiaodong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 211
  • [4] Image Captioning using Adversarial Networks and Reinforcement Learning
    Yan, Shiyang
    Wu, Fangyu
    Smith, Jeremy S.
    Lu, Wenjin
    Zhang, Bailing
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 248 - 253
  • [5] A Novel Hybrid Firefly Algorithm with Double-Level Learning Strategy
    Wang, Yufeng
    Zhao, Yubo
    Xu, Chunyu
    Zhan, Ying
    Chen, Ke
    MATHEMATICS, 2023, 11 (16)
  • [6] Triple-level relationship enhanced transformer for image captioning
    Zheng, Anqi
    Zheng, Shiqi
    Bai, Cong
    Chen, Deng
    MULTIMEDIA SYSTEMS, 2023, 29 (04) : 1955 - 1966
  • [7] Triple-level relationship enhanced transformer for image captioning
    Anqi Zheng
    Shiqi Zheng
    Cong Bai
    Deng Chen
    Multimedia Systems, 2023, 29 : 1955 - 1966
  • [8] Multi-level Visual Fusion Networks for Image Captioning
    Zhou, Dongming
    Zhang, Canlong
    Li, Zhixin
    Wang, Zhiwen
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [9] DELR: A double-level ensemble learning method for unsupervised anomaly detection
    Zhang, Jia
    Li, Zhiyong
    Nai, Ke
    Gu, Yu
    Sallam, Ahmed
    KNOWLEDGE-BASED SYSTEMS, 2019, 181
  • [10] DOUBLE-LEVEL METALLURGY DEFECT STUDY
    GREGORITSCH, AJ
    IEEE TRANSACTIONS ON ELECTRON DEVICES, 1979, 26 (01) : 34 - 37