OCR-oriented Master Object for Text Image Captioning

被引:6
|
作者
Tang, Wenliang [1 ]
Hu, Zhenzhen [1 ]
Song, Zijie [1 ]
Hong, Richang [1 ]
机构
[1] Hefei Univ Technol, Hefei, Peoples R China
关键词
Text image captioning; graph convolution network; scene graph;
D O I
10.1145/3512527.3531431
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text image captioning aims to understand the scene text in images for image caption generation. The key issue of this challenging task is to understand the relationship between the text OCR tokens and images. In this paper, we propose a novel text image captioning method by purifying the OCR-oriented scene graph with the master object. The master object is the object to which the OCR is attached, which is the semantic relationship bridge between the OCR token and the image. We consider the master object as a proxy to connect OCR tokens and other regions in the image. By exploring the master object for each OCR token, we build the purified scene graph based on the master objects and then enrich the visual embedding by the Graph Convolution Network (GCN). Furthermore, we cluster the OCR tokens and feed the hierarchical information to provide a richer representation. Experiments on the TextCaps validation and test dataset demonstrate the effectiveness of the proposed method.
引用
收藏
页码:39 / 43
页数:5
相关论文
共 50 条
  • [31] Text-Guided Attention Model for Image Captioning
    Mun, Jonghwan
    Cho, Minsu
    Han, Bohyung
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4233 - 4239
  • [32] News Image Captioning Based On Text Summarization Using Image As Query
    Chen, Jingqiang
    Hai Zhuge
    2019 15TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2019), 2019, : 123 - 126
  • [33] Object-aware semantics of attention for image captioning
    Wang, Shiwei
    Lan, Long
    Zhang, Xiang
    Dong, Guohua
    Luo, Zhigang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (3-4) : 2013 - 2030
  • [34] Incorporating object counts into remote sensing image captioning
    Ni, Zihao
    Zong, Zhaoyun
    Ren, Peng
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [35] Object-aware semantics of attention for image captioning
    Shiwei Wang
    Long Lan
    Xiang Zhang
    Guohua Dong
    Zhigang Luo
    Multimedia Tools and Applications, 2020, 79 : 2013 - 2030
  • [36] DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based image captioning
    Xu, Dongsheng
    Huang, Qingbao
    Zhang, Xingmao
    Cheng, Haonan
    Shuang, Feng
    Cai, Yi
    PATTERN RECOGNITION, 2025, 164
  • [37] Relational Distant Supervision for Image Captioning without Image-Text Pairs
    Qi, Yayun
    Zhao, Wentian
    Wu, Xinxiao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4524 - 4532
  • [38] Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
    Yang, Cong
    Li, Zuchao
    Zhang, Lefei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
  • [39] CONVERSION OF IMAGE TO TEXT TO SPEECH USING OCR AND TTS SYSTHESIS
    Dharshini, V
    Keerthana, P.
    Manju, T.
    Deepa, R.
    INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (05) : 432 - 438
  • [40] JBIG2 text image compression based on OCR
    Shang, Junqing
    Liu, Changsong
    Ding, Xiaoqing
    DOCUMENT RECOGNITION AND RETRIEVAL XIII, 2006, 6067