OCR-oriented Master Object for Text Image Captioning

被引：6

作者：

Tang, Wenliang ^{[1
]}

Hu, Zhenzhen ^{[1
]}

Song, Zijie ^{[1
]}

Hong, Richang ^{[1
]}

机构：

[1] Hefei Univ Technol, Hefei, Peoples R China

来源：

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022 | 2022年

关键词：

Text image captioning; graph convolution network; scene graph;

D O I：

10.1145/3512527.3531431

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text image captioning aims to understand the scene text in images for image caption generation. The key issue of this challenging task is to understand the relationship between the text OCR tokens and images. In this paper, we propose a novel text image captioning method by purifying the OCR-oriented scene graph with the master object. The master object is the object to which the OCR is attached, which is the semantic relationship bridge between the OCR token and the image. We consider the master object as a proxy to connect OCR tokens and other regions in the image. By exploring the master object for each OCR token, we build the purified scene graph based on the master objects and then enrich the visual embedding by the Graph Convolution Network (GCN). Furthermore, we cluster the OCR tokens and feed the hierarchical information to provide a richer representation. Experiments on the TextCaps validation and test dataset demonstrate the effectiveness of the proposed method.

引用

页码：39 / 43

页数：5

共 50 条

[31] Text-Guided Attention Model for Image Captioning
Mun, Jonghwan
Cho, Minsu
Han, Bohyung
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4233 - 4239
[32] News Image Captioning Based On Text Summarization Using Image As Query
Chen, Jingqiang
Hai Zhuge
2019 15TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2019), 2019, : 123 - 126
[33] Object-aware semantics of attention for image captioning
Wang, Shiwei
Lan, Long
Zhang, Xiang
Dong, Guohua
Luo, Zhigang
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (3-4) : 2013 - 2030
[34] Incorporating object counts into remote sensing image captioning
Ni, Zihao
Zong, Zhaoyun
Ren, Peng
INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
[35] Object-aware semantics of attention for image captioning
Shiwei Wang
Long Lan
Xiang Zhang
Guohua Dong
Zhigang Luo
Multimedia Tools and Applications, 2020, 79 : 2013 - 2030
[36] DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based image captioning
Xu, Dongsheng
Huang, Qingbao
Zhang, Xingmao
Cheng, Haonan
Shuang, Feng
Cai, Yi
PATTERN RECOGNITION, 2025, 164
[37] Relational Distant Supervision for Image Captioning without Image-Text Pairs
Qi, Yayun
Zhao, Wentian
Wu, Xinxiao
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4524 - 4532
[38] Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
Yang, Cong
Li, Zuchao
Zhang, Lefei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
[39] CONVERSION OF IMAGE TO TEXT TO SPEECH USING OCR AND TTS SYSTHESIS
Dharshini, V
Keerthana, P.
Manju, T.
Deepa, R.
INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (05) : 432 - 438
[40] JBIG2 text image compression based on OCR
Shang, Junqing
Liu, Changsong
Ding, Xiaoqing
DOCUMENT RECOGNITION AND RETRIEVAL XIII, 2006, 6067

← 1 2 3 4 5 →