Text Augmentation Using BERT for Image Captioning

被引:15
|
作者
Atliha, Viktar [1 ]
Sesok, Dmitrij [1 ]
机构
[1] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 17期
关键词
image captioning; augmentation; BERT;
D O I
10.3390/app10175978
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Image Captioning with Text-Based Visual Attention
    He, Chen
    Hu, Haifeng
    NEURAL PROCESSING LETTERS, 2019, 49 (01) : 177 - 185
  • [22] Text-Guided Attention Model for Image Captioning
    Mun, Jonghwan
    Cho, Minsu
    Han, Bohyung
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4233 - 4239
  • [23] Automatic image captioning system based on augmentation and ranking mechanism
    B. S. Revathi
    A. Meena Kowshalya
    Signal, Image and Video Processing, 2024, 18 : 265 - 274
  • [24] Automatic image captioning system based on augmentation and ranking mechanism
    Revathi, B. S.
    Kowshalya, A. Meena
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 265 - 274
  • [25] Improving scene text image captioning using transformer-based multilevel attention
    Srivastava, Swati
    Sharma, Himanshu
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (03)
  • [26] Text Simplification Using Transformer and BERT
    Alissa, Sarah
    Wald, Mike
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 3479 - 3495
  • [27] Relational Distant Supervision for Image Captioning without Image-Text Pairs
    Qi, Yayun
    Zhao, Wentian
    Wu, Xinxiao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4524 - 4532
  • [28] Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
    Yang, Cong
    Li, Zuchao
    Zhang, Lefei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
  • [29] More Grounded Image Captioning by Distilling Image-Text Matching Model
    Zhou, Yuanen
    Wang, Meng
    Liu, Daqing
    Hu, Zhenzhen
    Zhang, Hanwang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4776 - 4785
  • [30] Enhanced Text-Guided Attention Model for Image Captioning
    Zhou, Yuanen
    Hu, Zhenzhen
    Zhao, Ye
    Liu, Xueliang
    Hong, Richang
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,