Text Augmentation Using BERT for Image Captioning

被引：15

作者：

Atliha, Viktar ^{[1
]}

Sesok, Dmitrij ^{[1
]}

机构：

[1] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania

来源：

APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 17期

关键词：

image captioning; augmentation; BERT;

D O I：

10.3390/app10175978

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.

引用

页数：11

共 50 条

[31] A Text-Guided Generation and Refinement Model for Image Captioning
Wang, Depeng
Hu, Zhenzhen
Zhou, Yuanen
Hong, Richang
Wang, Meng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2966 - 2977
[32] OCR-oriented Master Object for Text Image Captioning
Tang, Wenliang
Hu, Zhenzhen
Song, Zijie
Hong, Richang
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 39 - 43
[33] Visual-Text Reference Pretraining Model for Image Captioning
Li, Pengfei
Zhang, Min
Lin, Peijie
Wan, Jian
Jiang, Ming
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[34] Learning Text-to-Video Retrieval from Image Captioning
Ventura, Lucas
Schmid, Cordelia
Varol, Gul
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 1834 - 1854
[35] VIXEN: Visual Text Comparison Network for Image Difference Captioning
Black, Alexander
Shi, Jing
Fan, Yifei
Bui, Tu
Collomosse, John
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 846 - 854
[36] Question-controlled Text-aware Image Captioning
Hu, Anwen
Chen, Shizhe
Jin, Qin
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3097 - 3105
[37] Dancing with words: Using animated text for captioning
Rashid, Raisa
Vy, Quoc
Hunt, Richard
Fels, Deborah I.
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2008, 24 (05) : 505 - 519
[38] ragBERT: Relationship-aligned and grammar-wise BERT model for image captioning
Wang, Hengyou
Song, Kani
Jiang, Xiang
He, Zhiquan
IMAGE AND VISION COMPUTING, 2024, 148
[39] Performance Evaluation of Text Augmentation Methods with BERT on Small -sized, Imbalanced Datasets
Hu, Lingshu
Li, Can
Wang, Wenbo
Pang, Bin
Shang, Yi
2022 IEEE 4TH INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE, COGMI, 2022, : 125 - 133
[40] XAI for Image Captioning using SHAP
Dewi, Christine
Chen, Rung-Ching
Yu, Hui
Jiang, Xiaoyi
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2023, 39 (04) : 711 - 724

← 1 2 3 4 5 →