Learning Object Context for Dense Captioning

被引:0
|
作者
Li, Xiangyang [1 ,2 ]
Jiang, Shuqiang [1 ,2 ]
Han, Jungong [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Univ Lancaster, Sch Comp & Commun, Lancaster, England
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dense captioning is a challenging task which not only detects visual elements in images but also generates natural language sentences to describe them. Previous approaches do not leverage object information in images for this task. However, objects provide valuable cues to help predict the locations of caption regions as caption regions often highly overlap with objects (i.e. caption regions are usually parts of objects or combinations of them). Meanwhile, objects also provide important information for describing a target caption region as the corresponding description not only depicts its properties, but also involves its interactions with objects in the image. In this work, we propose a novel scheme with an object context encoding Long Short-Term Memory (LSTM) network to automatically learn complementary object context for each caption region, transferring knowledge from objects to caption regions. All contextual objects are arranged as a sequence and progressively fed into the context encoding module to obtain context features. Then both the learned object context features and region features are used to predict the bounding box offsets and generate the descriptions. The context learning procedure is in conjunction with the optimization of both location prediction and caption generation, thus enabling the object context encoding LSTM to capture and aggregate useful object context. Experiments on benchmark datasets demonstrate the superiority of our proposed approach over the state-of-the-art methods.
引用
收藏
页码:8650 / 8657
页数:8
相关论文
共 50 条
  • [1] Context and Attribute Grounded Dense Captioning
    Yin, Guojun
    Sheng, Lu
    Liu, Bin
    Yu, Nenghai
    Wang, Xiaogang
    Shao, Jing
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6234 - 6243
  • [2] Multimodal object description network for dense captioning
    Wang, Weixuan
    Hu, Haifeng
    ELECTRONICS LETTERS, 2017, 53 (15) : 1041 - +
  • [3] Dense Captioning with Joint Inference and Visual Context
    Yang, Linjie
    Tang, Kevin
    Yang, Jianchao
    Li, Li-Jia
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1978 - 1987
  • [4] Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
    Wang, Jingwen
    Jiang, Wenhao
    Ma, Lin
    Liu, Wei
    Xu, Yong
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7190 - 7198
  • [5] Textual Context-Aware Dense Captioning With Diverse Words
    Shao, Zhuang
    Han, Jungong
    Debattista, Kurt
    Pang, Yanwei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8753 - 8766
  • [6] Multimodal Context Fusion Based Dense Video Captioning Algorithm
    Li, Meiqi
    Zhou, Ziwei
    ENGINEERING LETTERS, 2025, 33 (04) : 1061 - 1072
  • [7] An Object Localization-based Dense Image Captioning Framework in Hindi
    Mishra, Santosh Kumar
    Harshit
    Saha, Sriparna
    Bhattacharyya, Pushpak
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (02)
  • [8] Hierarchical Context-aware Network for Dense Video Event Captioning
    Ji, Lei
    Guo, Xianglin
    Huang, Haoyang
    Chen, Xilin
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2004 - 2013
  • [9] Region-Object Relation-Aware Dense Captioning via Transformer
    Shao, Zhuang
    Han, Jungong
    Marnerides, Demetris
    Debattista, Kurt
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [10] Exploring Dense Context for Salient Object Detection
    Mei, Haiyang
    Liu, Yuanyuan
    Wei, Ziqi
    Zhou, Dongsheng
    Wei, Xiaopeng
    Zhang, Qiang
    Yang, Xin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1378 - 1389