OCR-oriented Master Object for Text Image Captioning

被引：6

作者：

Tang, Wenliang ^{[1
]}

Hu, Zhenzhen ^{[1
]}

Song, Zijie ^{[1
]}

Hong, Richang ^{[1
]}

机构：

[1] Hefei Univ Technol, Hefei, Peoples R China

来源：

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022 | 2022年

关键词：

Text image captioning; graph convolution network; scene graph;

D O I：

10.1145/3512527.3531431

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text image captioning aims to understand the scene text in images for image caption generation. The key issue of this challenging task is to understand the relationship between the text OCR tokens and images. In this paper, we propose a novel text image captioning method by purifying the OCR-oriented scene graph with the master object. The master object is the object to which the OCR is attached, which is the semantic relationship bridge between the OCR token and the image. We consider the master object as a proxy to connect OCR tokens and other regions in the image. By exploring the master object for each OCR token, we build the purified scene graph based on the master objects and then enrich the visual embedding by the Graph Convolution Network (GCN). Furthermore, we cluster the OCR tokens and feed the hierarchical information to provide a richer representation. Experiments on the TextCaps validation and test dataset demonstrate the effectiveness of the proposed method.

引用

页码：39 / 43

页数：5

共 50 条

[41] OCR Based Image Text To Speech Conversion Using MATLAB
Madre, Sneha. C.
Gundre, S. B.
PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 858 - 861
[42] More Grounded Image Captioning by Distilling Image-Text Matching Model
Zhou, Yuanen
Wang, Meng
Liu, Daqing
Hu, Zhenzhen
Zhang, Hanwang
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4776 - 4785
[43] Enhanced Text-Guided Attention Model for Image Captioning
Zhou, Yuanen
Hu, Zhenzhen
Zhao, Ye
Liu, Xueliang
Hong, Richang
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
[44] A Text-Guided Generation and Refinement Model for Image Captioning
Wang, Depeng
Hu, Zhenzhen
Zhou, Yuanen
Hong, Richang
Wang, Meng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2966 - 2977
[45] Improving Automatic Image Captioning Using Text Summarization Techniques
Plaza, Laura
Lloret, Elena
Aker, Ahmet
TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 165 - +
[46] Visual-Text Reference Pretraining Model for Image Captioning
Li, Pengfei
Zhang, Min
Lin, Peijie
Wan, Jian
Jiang, Ming
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[47] Learning Text-to-Video Retrieval from Image Captioning
Ventura, Lucas
Schmid, Cordelia
Varol, Gul
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 1834 - 1854
[48] VIXEN: Visual Text Comparison Network for Image Difference Captioning
Black, Alexander
Shi, Jing
Fan, Yifei
Bui, Tu
Collomosse, John
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 846 - 854
[49] Question-controlled Text-aware Image Captioning
Hu, Anwen
Chen, Shizhe
Jin, Qin
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3097 - 3105
[50] Combining Object-Based Attention and Attributes for Image Captioning
Li, Cong
Chen, Jiansheng
Wan, Weitao
Li, Tianpeng
IMAGE AND GRAPHICS (ICIG 2017), PT I, 2017, 10666 : 614 - 625

← 1 2 3 4 5 →