OCR-oriented Master Object for Text Image Captioning

被引:6
|
作者
Tang, Wenliang [1 ]
Hu, Zhenzhen [1 ]
Song, Zijie [1 ]
Hong, Richang [1 ]
机构
[1] Hefei Univ Technol, Hefei, Peoples R China
关键词
Text image captioning; graph convolution network; scene graph;
D O I
10.1145/3512527.3531431
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text image captioning aims to understand the scene text in images for image caption generation. The key issue of this challenging task is to understand the relationship between the text OCR tokens and images. In this paper, we propose a novel text image captioning method by purifying the OCR-oriented scene graph with the master object. The master object is the object to which the OCR is attached, which is the semantic relationship bridge between the OCR token and the image. We consider the master object as a proxy to connect OCR tokens and other regions in the image. By exploring the master object for each OCR token, we build the purified scene graph based on the master objects and then enrich the visual embedding by the Graph Convolution Network (GCN). Furthermore, we cluster the OCR tokens and feed the hierarchical information to provide a richer representation. Experiments on the TextCaps validation and test dataset demonstrate the effectiveness of the proposed method.
引用
收藏
页码:39 / 43
页数:5
相关论文
共 50 条
  • [21] Object Relation Attention for Image Paragraph Captioning
    Yang, Li-Chuan
    Yang, Chih-Yuan
    Hsu, Jane Yung-jen
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3136 - 3144
  • [22] Searching for memory-lighter architectures for OCR-augmented image captioning
    Gallardo-García, Rafael
    Beltrán-Martínez, Beatriz
    Hernández-Gracidas, Carlos
    Vilariño-Ayala, Darnes
    Journal of Intelligent and Fuzzy Systems, 2022, 42 (05): : 4399 - 4410
  • [23] Searching for memory-lighter architectures for OCR-augmented image captioning
    Gallardo-Garcia, Rafael
    Beltran-Martinez, Beatriz
    Hernandez-Gracidas, Carlos
    Vilarino-Aya, Darnes
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4399 - 4410
  • [24] Switching Text-Based Image Encoders for Captioning Images With Text
    Ueda, Arisa
    Yang, Wei
    Sugiura, Komei
    IEEE ACCESS, 2023, 11 : 55706 - 55715
  • [25] Visuals to Text: A Comprehensive Review on Automatic Image Captioning
    Yue Ming
    Nannan Hu
    Chunxiao Fan
    Fan Feng
    Jiangwan Zhou
    Hui Yu
    IEEE/CAAJournalofAutomaticaSinica, 2022, 9 (08) : 1339 - 1365
  • [26] Visuals to Text: A Comprehensive Review on Automatic Image Captioning
    Ming, Yue
    Hu, Nannan
    Fan, Chunxiao
    Feng, Fan
    Zhou, Jiangwan
    Yu, Hui
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (08) : 1339 - 1365
  • [27] Image Captioning with Text-Based Visual Attention
    Chen He
    Haifeng Hu
    Neural Processing Letters, 2019, 49 : 177 - 185
  • [28] A TEXT-GUIDED GRAPH STRUCTURE FOR IMAGE CAPTIONING
    Wang, Depeng
    Hu, Zhenzhen
    Zhou, Yuanen
    Liu, Xueliang
    Wu, Le
    Hong, Richang
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2020,
  • [29] Text Embedding Bank for Detailed Image Paragraph Captioning
    Gupta, Arjun
    Shen, Zengming
    Huang, Thomas
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15791 - 15792
  • [30] Image Captioning with Text-Based Visual Attention
    He, Chen
    Hu, Haifeng
    NEURAL PROCESSING LETTERS, 2019, 49 (01) : 177 - 185