Text Augmentation Using BERT for Image Captioning

被引:15
|
作者
Atliha, Viktar [1 ]
Sesok, Dmitrij [1 ]
机构
[1] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 17期
关键词
image captioning; augmentation; BERT;
D O I
10.3390/app10175978
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Text Augmentation for Compressed Image Captioning Models
    Atliha, Viktar
    Sesok, Dmitrij
    2022 IEEE OPEN CONFERENCE OF ELECTRICAL, ELECTRONIC AND INFORMATION SCIENCES (ESTREAM), 2022,
  • [2] Image Captioning using Deep Learning: Text Augmentation by Paraphrasing via Backtranslation
    Turkerud, Ingrid Ravn
    Mengshoel, Ole Jakob
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [3] Multimodal Data Augmentation for Image Captioning using Diffusion Models
    Xiao, Changrong
    Xu, Sean Xin
    Zhang, Kunpeng
    PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023, 2023, : 23 - 33
  • [4] News Image Captioning Based On Text Summarization Using Image As Query
    Chen, Jingqiang
    Hai Zhuge
    2019 15TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2019), 2019, : 123 - 126
  • [5] Image captioning with data augmentation using cropping and mask based on attention image
    Iwamura K.
    Louhi Kasahara J.Y.
    Moro A.
    Yamashita A.
    Asama H.
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2020, 86 (11): : 904 - 910
  • [6] Text to Image Synthesis for Improved Image Captioning
    Hossain, Md. Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    Bennamoun, Mohammed
    IEEE ACCESS, 2021, 9 : 64918 - 64928
  • [7] High-level Image Classification by Synergizing Image Captioning with BERT
    Yu, Xiaohong
    Ahn, Yoseop
    Jeong, Jaehoon
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 1686 - 1690
  • [8] Improving Automatic Image Captioning Using Text Summarization Techniques
    Plaza, Laura
    Lloret, Elena
    Aker, Ahmet
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 165 - +
  • [9] Visual to Text: Survey of Image and Video Captioning
    Li, Sheng
    Tao, Zhiqiang
    Li, Kang
    Fu, Yun
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2019, 3 (04): : 297 - 312
  • [10] Image Captioning Generator Text-to-Speech
    Sharma, Tripti
    Anand, Neetu
    Gaurav, Kumar
    Kapur, Rohit
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2022, 13 (03): : 448 - 457