OCR-oriented Master Object for Text Image Captioning

被引：6

作者：

Tang, Wenliang ^{[1
]}

Hu, Zhenzhen ^{[1
]}

Song, Zijie ^{[1
]}

Hong, Richang ^{[1
]}

机构：

[1] Hefei Univ Technol, Hefei, Peoples R China

来源：

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022 | 2022年

关键词：

Text image captioning; graph convolution network; scene graph;

D O I：

10.1145/3512527.3531431

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text image captioning aims to understand the scene text in images for image caption generation. The key issue of this challenging task is to understand the relationship between the text OCR tokens and images. In this paper, we propose a novel text image captioning method by purifying the OCR-oriented scene graph with the master object. The master object is the object to which the OCR is attached, which is the semantic relationship bridge between the OCR token and the image. We consider the master object as a proxy to connect OCR tokens and other regions in the image. By exploring the master object for each OCR token, we build the purified scene graph based on the master objects and then enrich the visual embedding by the Graph Convolution Network (GCN). Furthermore, we cluster the OCR tokens and feed the hierarchical information to provide a richer representation. Experiments on the TextCaps validation and test dataset demonstrate the effectiveness of the proposed method.

引用

页码：39 / 43

页数：5

共 50 条

[1] COME: Clip-OCR and Master ObjEct for text image captioning
Lv, Gang
Sun, Yining
Nian, Fudong
Zhu, Maofei
Tang, Wenliang
Hu, Zhenzhen
IMAGE AND VISION COMPUTING, 2023, 136
[2] Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning
Wang, Jing
Tang, Jinhui
Luo, Jiebo
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4346 - 4354
[3] OBJECT-ORIENTED BACKDOOR ATTACK AGAINST IMAGE CAPTIONING
Li, Meiling
Zhong, Nan
Zhang, Xinpeng
Qian, Zhenxing
Li, Sheng
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2864 - 2868
[4] Object Hallucination in Image Captioning
Rohrbach, Anna
Hendricks, Lisa Anne
Burns, Kaylee
Darrell, Trevor
Saenko, Kate
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4035 - 4045
[5] Text to Image Synthesis for Improved Image Captioning
Hossain, Md. Zakir
Sohel, Ferdous
Shiratuddin, Mohd Fairuz
Laga, Hamid
Bennamoun, Mohammed
IEEE ACCESS, 2021, 9 : 64918 - 64928
[6] Exploring coherence from heterogeneous representations for OCR image captioning
Zhang, Yao
Song, Zijie
Hu, Zhenzhen
MULTIMEDIA SYSTEMS, 2024, 30 (05)
[7] Object semantic analysis for image captioning
Sen Du
Hong Zhu
Guangfeng Lin
Dong Wang
Jing Shi
Jing Wang
Multimedia Tools and Applications, 2023, 82 : 43179 - 43206
[8] Object Modifier Generation for Image Captioning
Liao, Lidou
Song, Yonghong
Zhang, Yuanlin
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 52 - 57
[9] Object semantic analysis for image captioning
Du, Sen
Zhu, Hong
Lin, Guangfeng
Wang, Dong
Shi, Jing
Wang, Jing
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (28) : 43179 - 43206
[10] Image Captioning with Object Detection and Localization
Yang, Zhongliang
Zhang, Yu-Jin
Rehman, Sadaqat Ur
Huang, Yongfeng
IMAGE AND GRAPHICS (ICIG 2017), PT II, 2017, 10667 : 109 - 118

← 1 2 3 4 5 →