OCR-oriented Master Object for Text Image Captioning

被引:6
|
作者
Tang, Wenliang [1 ]
Hu, Zhenzhen [1 ]
Song, Zijie [1 ]
Hong, Richang [1 ]
机构
[1] Hefei Univ Technol, Hefei, Peoples R China
关键词
Text image captioning; graph convolution network; scene graph;
D O I
10.1145/3512527.3531431
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text image captioning aims to understand the scene text in images for image caption generation. The key issue of this challenging task is to understand the relationship between the text OCR tokens and images. In this paper, we propose a novel text image captioning method by purifying the OCR-oriented scene graph with the master object. The master object is the object to which the OCR is attached, which is the semantic relationship bridge between the OCR token and the image. We consider the master object as a proxy to connect OCR tokens and other regions in the image. By exploring the master object for each OCR token, we build the purified scene graph based on the master objects and then enrich the visual embedding by the Graph Convolution Network (GCN). Furthermore, we cluster the OCR tokens and feed the hierarchical information to provide a richer representation. Experiments on the TextCaps validation and test dataset demonstrate the effectiveness of the proposed method.
引用
收藏
页码:39 / 43
页数:5
相关论文
共 50 条
  • [1] COME: Clip-OCR and Master ObjEct for text image captioning
    Lv, Gang
    Sun, Yining
    Nian, Fudong
    Zhu, Maofei
    Tang, Wenliang
    Hu, Zhenzhen
    IMAGE AND VISION COMPUTING, 2023, 136
  • [2] Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning
    Wang, Jing
    Tang, Jinhui
    Luo, Jiebo
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4346 - 4354
  • [3] OBJECT-ORIENTED BACKDOOR ATTACK AGAINST IMAGE CAPTIONING
    Li, Meiling
    Zhong, Nan
    Zhang, Xinpeng
    Qian, Zhenxing
    Li, Sheng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2864 - 2868
  • [4] Object Hallucination in Image Captioning
    Rohrbach, Anna
    Hendricks, Lisa Anne
    Burns, Kaylee
    Darrell, Trevor
    Saenko, Kate
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4035 - 4045
  • [5] Text to Image Synthesis for Improved Image Captioning
    Hossain, Md. Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    Bennamoun, Mohammed
    IEEE ACCESS, 2021, 9 : 64918 - 64928
  • [6] Exploring coherence from heterogeneous representations for OCR image captioning
    Zhang, Yao
    Song, Zijie
    Hu, Zhenzhen
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [7] Object semantic analysis for image captioning
    Sen Du
    Hong Zhu
    Guangfeng Lin
    Dong Wang
    Jing Shi
    Jing Wang
    Multimedia Tools and Applications, 2023, 82 : 43179 - 43206
  • [8] Object Modifier Generation for Image Captioning
    Liao, Lidou
    Song, Yonghong
    Zhang, Yuanlin
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 52 - 57
  • [9] Object semantic analysis for image captioning
    Du, Sen
    Zhu, Hong
    Lin, Guangfeng
    Wang, Dong
    Shi, Jing
    Wang, Jing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (28) : 43179 - 43206
  • [10] Image Captioning with Object Detection and Localization
    Yang, Zhongliang
    Zhang, Yu-Jin
    Rehman, Sadaqat Ur
    Huang, Yongfeng
    IMAGE AND GRAPHICS (ICIG 2017), PT II, 2017, 10667 : 109 - 118