Learning Double-Level Relationship Networks for image captioning

被引:8
|
作者
Wang, Changzhi [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200438, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Local-global relationship; Relationship network; Graph attention network; ATTENTION;
D O I
10.1016/j.ipm.2023.103288
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning aims to generate descriptive sentences to describe image main contents. Existing attention-based approaches mainly focus on the salient visual features in the image. However, ignoring the learning relationship between local features and global features may cause local features to lose the interaction with global concepts, generating impropriate or inaccurate relationship words/phrases in the sentences. To alleviate the above issue, in this work we propose the Double-Level Relationship Networks (DLRN) that novelly exploits the complementary local features and global features in the image, and enhances the relationship between features. Technically, DLRN builds two types of networks, separate relationship network and unified relationship embedding network. The former learns different hierarchies of visual relationship by performing graph attention for local-level relationship enhancement and pixel-level relationship enhancement respectively. The latter takes the global features as the guide to learn the local-global relationship between local regions and global concepts, and obtains the feature representation containing rich relationship information. Further, we devise an attention-based feature fusion module to fully utilize the contribution of different modalities. It effectively fuses the previously obtained relationship features and original region features. Extensive experiments on three typical datasets verify that our DLRN significantly outperforms several state-of-the-art baselines. More remarkably, DLRN achieves the competitive performance while maintaining notable model efficiency. The source code is available at the GitHub https://github.com/RunCode90/ImageCaptioning.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Image Captioning using Deep Learning
    Jain, Yukti Sanjay
    Dhopeshwar, Tanisha
    Chadha, Supreet Kaur
    Pagire, Vrushali
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021,
  • [42] Learning Transferable Perturbations for Image Captioning
    Wu, Hanjie
    Liu, Yongtuo
    Cai, Hongmin
    He, Shengfeng
    ACM Transactions on Multimedia Computing, Communications and Applications, 2022, 18 (02)
  • [43] Image Captioning Using Deep Learning
    Adithya, Paluvayi Veera
    Kalidindi, Mourya Viswanadh
    Swaroop, Nallani Jyothi
    Vishwas, H. N.
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 42 - 58
  • [44] A New Approach of Double-level Grid-based Target Localization in Wireless Sensor Networks
    Wang, Daifei
    Tang, Guoming
    Xie, Yi
    Xiao, Weidong
    Yuan, Zang
    Zhang, Wei
    2012 IEEE SENSORS PROCEEDINGS, 2012, : 1802 - 1805
  • [45] Double-level lumbar spondylolysis and spondylolisthesis: A retrospective study
    Zhang, Shengtao
    Ye, Conglin
    Lai, Qi
    Yu, Xiaolong
    Liu, Xuqiang
    Nie, Tao
    Zhan, Haibo
    Dai, Min
    Zhang, Bin
    JOURNAL OF ORTHOPAEDIC SURGERY AND RESEARCH, 2018, 13
  • [46] Model of rigid spheres for double-level differential scattering
    Dashevskaya, E.I.
    Nikitin, E.E.
    Soviet Journal of Chemical Physics, 1994, 11 (12):
  • [47] Privacy-Preserving Image Captioning with Deep Learning and Double Random Phase Encoding
    Martin, Antoinette Deborah
    Ahmadzadeh, Ezat
    Moon, Inkyu
    MATHEMATICS, 2022, 10 (16)
  • [48] A DOUBLE-LEVEL SCALER WITH OUTPUT CIRCUITS AND A RESET CIRCUIT
    GOLUB, VS
    INSTRUMENTS AND EXPERIMENTAL TECHNIQUES-USSR, 1965, (02): : 332 - &
  • [49] The Application of Objective Programming in Double-Level Supply Chain
    Wang Qiang
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING, 2009, : 880 - 884
  • [50] Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
    Gu, Jiuxiang
    Cai, Jianfei
    Wang, Gang
    Chen, Tsuhan
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6837 - 6844