Text Augmentation Using BERT for Image Captioning

被引:15
|
作者
Atliha, Viktar [1 ]
Sesok, Dmitrij [1 ]
机构
[1] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 17期
关键词
image captioning; augmentation; BERT;
D O I
10.3390/app10175978
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] A Text-Guided Generation and Refinement Model for Image Captioning
    Wang, Depeng
    Hu, Zhenzhen
    Zhou, Yuanen
    Hong, Richang
    Wang, Meng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2966 - 2977
  • [32] OCR-oriented Master Object for Text Image Captioning
    Tang, Wenliang
    Hu, Zhenzhen
    Song, Zijie
    Hong, Richang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 39 - 43
  • [33] Visual-Text Reference Pretraining Model for Image Captioning
    Li, Pengfei
    Zhang, Min
    Lin, Peijie
    Wan, Jian
    Jiang, Ming
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [34] Learning Text-to-Video Retrieval from Image Captioning
    Ventura, Lucas
    Schmid, Cordelia
    Varol, Gul
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 1834 - 1854
  • [35] VIXEN: Visual Text Comparison Network for Image Difference Captioning
    Black, Alexander
    Shi, Jing
    Fan, Yifei
    Bui, Tu
    Collomosse, John
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 846 - 854
  • [36] Question-controlled Text-aware Image Captioning
    Hu, Anwen
    Chen, Shizhe
    Jin, Qin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3097 - 3105
  • [37] Dancing with words: Using animated text for captioning
    Rashid, Raisa
    Vy, Quoc
    Hunt, Richard
    Fels, Deborah I.
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2008, 24 (05) : 505 - 519
  • [38] ragBERT: Relationship-aligned and grammar-wise BERT model for image captioning
    Wang, Hengyou
    Song, Kani
    Jiang, Xiang
    He, Zhiquan
    IMAGE AND VISION COMPUTING, 2024, 148
  • [39] Performance Evaluation of Text Augmentation Methods with BERT on Small -sized, Imbalanced Datasets
    Hu, Lingshu
    Li, Can
    Wang, Wenbo
    Pang, Bin
    Shang, Yi
    2022 IEEE 4TH INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE, COGMI, 2022, : 125 - 133
  • [40] XAI for Image Captioning using SHAP
    Dewi, Christine
    Chen, Rung-Ching
    Yu, Hui
    Jiang, Xiaoyi
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2023, 39 (04) : 711 - 724