OCR-oriented Master Object for Text Image Captioning

被引:6
|
作者
Tang, Wenliang [1 ]
Hu, Zhenzhen [1 ]
Song, Zijie [1 ]
Hong, Richang [1 ]
机构
[1] Hefei Univ Technol, Hefei, Peoples R China
关键词
Text image captioning; graph convolution network; scene graph;
D O I
10.1145/3512527.3531431
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text image captioning aims to understand the scene text in images for image caption generation. The key issue of this challenging task is to understand the relationship between the text OCR tokens and images. In this paper, we propose a novel text image captioning method by purifying the OCR-oriented scene graph with the master object. The master object is the object to which the OCR is attached, which is the semantic relationship bridge between the OCR token and the image. We consider the master object as a proxy to connect OCR tokens and other regions in the image. By exploring the master object for each OCR token, we build the purified scene graph based on the master objects and then enrich the visual embedding by the Graph Convolution Network (GCN). Furthermore, we cluster the OCR tokens and feed the hierarchical information to provide a richer representation. Experiments on the TextCaps validation and test dataset demonstrate the effectiveness of the proposed method.
引用
收藏
页码:39 / 43
页数:5
相关论文
共 50 条
  • [41] OCR Based Image Text To Speech Conversion Using MATLAB
    Madre, Sneha. C.
    Gundre, S. B.
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 858 - 861
  • [42] More Grounded Image Captioning by Distilling Image-Text Matching Model
    Zhou, Yuanen
    Wang, Meng
    Liu, Daqing
    Hu, Zhenzhen
    Zhang, Hanwang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4776 - 4785
  • [43] Enhanced Text-Guided Attention Model for Image Captioning
    Zhou, Yuanen
    Hu, Zhenzhen
    Zhao, Ye
    Liu, Xueliang
    Hong, Richang
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [44] A Text-Guided Generation and Refinement Model for Image Captioning
    Wang, Depeng
    Hu, Zhenzhen
    Zhou, Yuanen
    Hong, Richang
    Wang, Meng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2966 - 2977
  • [45] Improving Automatic Image Captioning Using Text Summarization Techniques
    Plaza, Laura
    Lloret, Elena
    Aker, Ahmet
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 165 - +
  • [46] Visual-Text Reference Pretraining Model for Image Captioning
    Li, Pengfei
    Zhang, Min
    Lin, Peijie
    Wan, Jian
    Jiang, Ming
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [47] Learning Text-to-Video Retrieval from Image Captioning
    Ventura, Lucas
    Schmid, Cordelia
    Varol, Gul
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 1834 - 1854
  • [48] VIXEN: Visual Text Comparison Network for Image Difference Captioning
    Black, Alexander
    Shi, Jing
    Fan, Yifei
    Bui, Tu
    Collomosse, John
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 846 - 854
  • [49] Question-controlled Text-aware Image Captioning
    Hu, Anwen
    Chen, Shizhe
    Jin, Qin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3097 - 3105
  • [50] Combining Object-Based Attention and Attributes for Image Captioning
    Li, Cong
    Chen, Jiansheng
    Wan, Weitao
    Li, Tianpeng
    IMAGE AND GRAPHICS (ICIG 2017), PT I, 2017, 10666 : 614 - 625