OCR-oriented Master Object for Text Image Captioning

被引：6

作者：

Tang, Wenliang ^{[1
]}

Hu, Zhenzhen ^{[1
]}

Song, Zijie ^{[1
]}

Hong, Richang ^{[1
]}

机构：

[1] Hefei Univ Technol, Hefei, Peoples R China

来源：

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022 | 2022年

关键词：

Text image captioning; graph convolution network; scene graph;

D O I：

10.1145/3512527.3531431

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text image captioning aims to understand the scene text in images for image caption generation. The key issue of this challenging task is to understand the relationship between the text OCR tokens and images. In this paper, we propose a novel text image captioning method by purifying the OCR-oriented scene graph with the master object. The master object is the object to which the OCR is attached, which is the semantic relationship bridge between the OCR token and the image. We consider the master object as a proxy to connect OCR tokens and other regions in the image. By exploring the master object for each OCR token, we build the purified scene graph based on the master objects and then enrich the visual embedding by the Graph Convolution Network (GCN). Furthermore, we cluster the OCR tokens and feed the hierarchical information to provide a richer representation. Experiments on the TextCaps validation and test dataset demonstrate the effectiveness of the proposed method.

引用

页码：39 / 43

页数：5

共 50 条

[21] Object Relation Attention for Image Paragraph Captioning
Yang, Li-Chuan
Yang, Chih-Yuan
Hsu, Jane Yung-jen
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3136 - 3144
[22] Searching for memory-lighter architectures for OCR-augmented image captioning
Gallardo-García, Rafael
Beltrán-Martínez, Beatriz
Hernández-Gracidas, Carlos
Vilariño-Ayala, Darnes
Journal of Intelligent and Fuzzy Systems, 2022, 42 (05): : 4399 - 4410
[23] Searching for memory-lighter architectures for OCR-augmented image captioning
Gallardo-Garcia, Rafael
Beltran-Martinez, Beatriz
Hernandez-Gracidas, Carlos
Vilarino-Aya, Darnes
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4399 - 4410
[24] Switching Text-Based Image Encoders for Captioning Images With Text
Ueda, Arisa
Yang, Wei
Sugiura, Komei
IEEE ACCESS, 2023, 11 : 55706 - 55715
[25] Visuals to Text: A Comprehensive Review on Automatic Image Captioning
Yue Ming
Nannan Hu
Chunxiao Fan
Fan Feng
Jiangwan Zhou
Hui Yu
IEEE/CAAJournalofAutomaticaSinica, 2022, 9 (08) : 1339 - 1365
[26] Visuals to Text: A Comprehensive Review on Automatic Image Captioning
Ming, Yue
Hu, Nannan
Fan, Chunxiao
Feng, Fan
Zhou, Jiangwan
Yu, Hui
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (08) : 1339 - 1365
[27] Image Captioning with Text-Based Visual Attention
Chen He
Haifeng Hu
Neural Processing Letters, 2019, 49 : 177 - 185
[28] A TEXT-GUIDED GRAPH STRUCTURE FOR IMAGE CAPTIONING
Wang, Depeng
Hu, Zhenzhen
Zhou, Yuanen
Liu, Xueliang
Wu, Le
Hong, Richang
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2020,
[29] Text Embedding Bank for Detailed Image Paragraph Captioning
Gupta, Arjun
Shen, Zengming
Huang, Thomas
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15791 - 15792
[30] Image Captioning with Text-Based Visual Attention
He, Chen
Hu, Haifeng
NEURAL PROCESSING LETTERS, 2019, 49 (01) : 177 - 185

← 1 2 3 4 5 →