Text Augmentation Using BERT for Image Captioning

被引：15

作者：

Atliha, Viktar ^{[1
]}

Sesok, Dmitrij ^{[1
]}

机构：

[1] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania

来源：

APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 17期

关键词：

image captioning; augmentation; BERT;

D O I：

10.3390/app10175978

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.

引用

页数：11

共 50 条

[21] Image Captioning with Text-Based Visual Attention
He, Chen
Hu, Haifeng
NEURAL PROCESSING LETTERS, 2019, 49 (01) : 177 - 185
[22] Text-Guided Attention Model for Image Captioning
Mun, Jonghwan
Cho, Minsu
Han, Bohyung
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4233 - 4239
[23] Automatic image captioning system based on augmentation and ranking mechanism
B. S. Revathi
A. Meena Kowshalya
Signal, Image and Video Processing, 2024, 18 : 265 - 274
[24] Automatic image captioning system based on augmentation and ranking mechanism
Revathi, B. S.
Kowshalya, A. Meena
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 265 - 274
[25] Improving scene text image captioning using transformer-based multilevel attention
Srivastava, Swati
Sharma, Himanshu
JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (03)
[26] Text Simplification Using Transformer and BERT
Alissa, Sarah
Wald, Mike
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 3479 - 3495
[27] Relational Distant Supervision for Image Captioning without Image-Text Pairs
Qi, Yayun
Zhao, Wentian
Wu, Xinxiao
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4524 - 4532
[28] Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
Yang, Cong
Li, Zuchao
Zhang, Lefei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
[29] More Grounded Image Captioning by Distilling Image-Text Matching Model
Zhou, Yuanen
Wang, Meng
Liu, Daqing
Hu, Zhenzhen
Zhang, Hanwang
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4776 - 4785
[30] Enhanced Text-Guided Attention Model for Image Captioning
Zhou, Yuanen
Hu, Zhenzhen
Zhao, Ye
Liu, Xueliang
Hong, Richang
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,

← 1 2 3 4 5 →