ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor

被引:3
|
作者
Hossen, Md. Bipul [1 ]
Ye, Zhongfu [1 ]
Abdussalam, Amr [1 ]
Hossain, Mohammad Alamgir [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Anhui, Peoples R China
关键词
Fine-grained image caption; Attention mechanism; Encoder-decoder; Independent attribute predictor; Enhanced attribute predictor;
D O I
10.1016/j.displa.2024.102798
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction and their utilization play a crucial role in enhancing image captioning performance. Despite progress in prior attribute-related methods, they either focus on predicting attributes related to the input image or concentrate on predicting linguistic context-related attributes at each time step in the language model. However, these approaches often overlook the importance of balancing visual and linguistic contexts, leading to ineffective exploitation of semantic information and a subsequent decline in performance. To address these issues, an Independent Attribute Predictor (IAP) is introduced to precisely predict attributes related to the input image by leveraging relationships between visual objects and attribute embeddings. Following this, an Enhanced Attribute Predictor (EAP) is proposed, initially predicting linguistic context-related attributes and then using prior probabilities from the IAP module to rebalance image and linguistic context-related attributes, thereby generating more robust and enhanced attribute probabilities. These refined attributes are then integrated into the language LSTM layer to ensure accurate word prediction at each time step. The integration of the IAP and EAP modules in our proposed image captioning with the enhanced attribute predictor (ICEAP) model effectively incorporates high-level semantic details, enhancing overall model performance. The ICEAP outperforms contemporary models, yielding significant average improvements of 10.62% in CIDEr-D scores for MS-COCO, 9.63% for Flickr30K and 7.74% for Flickr8K datasets using cross-entropy optimization, with qualitative analysis confirming its ability to generate fine-grained captions.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Subtler mixed attention network on fine-grained image classification
    Chao Liu
    Lei Huang
    Zhiqiang Wei
    Wenfeng Zhang
    Applied Intelligence, 2021, 51 : 7903 - 7916
  • [32] Feature Correlation Residual Network for Fine-Grained Image Recognition
    Xu, Jiazhen
    Wei, Yantao
    Deng, Wei
    IEEE ACCESS, 2020, 8 : 214322 - 214331
  • [33] High-Quality Image Captioning With Fine-Grained and Semantic-Guided Visual Attention
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (07) : 1681 - 1693
  • [34] CASCADE ATTENTION FUSION FOR FINE-GRAINED IMAGE CAPTIONING BASED ON MULTI-LAYER LSTM
    Wang, Shuang
    Meng, Yun
    Gu, Yu
    Zhang, Lei
    Ye, Xiutiao
    Tian, Jingxian
    Jiao, Licheng
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2245 - 2249
  • [35] Attention-Guided Hierarchical Parsing for Fine-Grained Person-Centric Image Captioning
    Gu, Zhengcheng
    Jin, Jing
    IEEE ACCESS, 2024, 12 : 86293 - 86301
  • [36] Fine-Grained Length Controllable Video Captioning With Ordinal Embeddings
    Nitta, Tomoya
    Fukuzawa, Takumi
    Tamaki, Toru
    IEEE ACCESS, 2024, 12 : 189667 - 189688
  • [37] Text-Enhanced Attribute-Based Attention for Generalized Zero-Shot Fine-Grained Image Classification
    Chen, Yan-He
    Yeh, Mei-Chen
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 447 - 450
  • [38] PreciseControl: Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control
    Parihar, Rishubh
    Sachidanand, V. S.
    Mani, Sabraswaran
    Karmali, Tejan
    Babu, R. Venkatesh
    COMPUTER VISION-ECCV 2024, PT LXXXII, 2025, 15140 : 469 - 487
  • [39] Multi-Task Attribute-Fusion Model for Fine-grained Image Recognition
    Li, Mengze
    Kong, Ming
    Kuang, Kun
    Zhu, Qiang
    Wu, Fei
    OPTOELECTRONIC IMAGING AND MULTIMEDIA TECHNOLOGY VII, 2020, 11550
  • [40] Attribute hierarchy based multi-task learning for fine-grained image classification
    Zhao, Junjie
    Peng, Yuxin
    He, Xiangteng
    NEUROCOMPUTING, 2020, 395 : 150 - 159