Text Augmentation Using BERT for Image Captioning

被引:15
|
作者
Atliha, Viktar [1 ]
Sesok, Dmitrij [1 ]
机构
[1] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 17期
关键词
image captioning; augmentation; BERT;
D O I
10.3390/app10175978
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Image Captioning using Deep Learning
    Jain, Yukti Sanjay
    Dhopeshwar, Tanisha
    Chadha, Supreet Kaur
    Pagire, Vrushali
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021,
  • [42] Fast image captioning using LSTM
    Meng Han
    Wenyu Chen
    Alemu Dagmawi Moges
    Cluster Computing, 2019, 22 : 6143 - 6155
  • [43] Fast image captioning using LSTM
    Han, Meng
    Chen, Wenyu
    Moges, Alemu Dagmawi
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S6143 - S6155
  • [44] Image Captioning Using Deep Learning
    Adithya, Paluvayi Veera
    Kalidindi, Mourya Viswanadh
    Swaroop, Nallani Jyothi
    Vishwas, H. N.
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 42 - 58
  • [45] Japanese abstractive text summarization using BERT
    Iwasaki, Yuuki
    Yamashita, Akihiro
    Konno, Yoko
    Matsubayashi, Katsushi
    2019 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2019,
  • [46] Turkish Medical Text Classification Using BERT
    Celikten, Azer
    Bulut, Hasan
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [47] Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs
    Fu, Kun
    Li, Jin
    Jin, Junqi
    Zhang, Changshui
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) : 5910 - 5921
  • [48] Multimodal NLP for image captioning : Fusing text and image modalities for accurate and informative descriptions
    Tiwari, Manisha
    Khare, Pragati
    Saha, Ishani
    Mali, Mahesh
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2024, 45 (04): : 1041 - 1049
  • [49] Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
    Luol, Jianjie
    Chen, Jingwen
    Li, Yehao
    Pan, Yingwei
    Feng, Jianlin
    Cha, Hongyang
    Yao, Ting
    COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 237 - 254
  • [50] Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning
    Wang, Jing
    Tang, Jinhui
    Luo, Jiebo
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4346 - 4354