Text Augmentation Using BERT for Image Captioning

被引：15

作者：

Atliha, Viktar ^{[1
]}

Sesok, Dmitrij ^{[1
]}

机构：

[1] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania

来源：

APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 17期

关键词：

image captioning; augmentation; BERT;

D O I：

10.3390/app10175978

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.

引用

页数：11

共 50 条

[41] Image Captioning using Deep Learning
Jain, Yukti Sanjay
Dhopeshwar, Tanisha
Chadha, Supreet Kaur
Pagire, Vrushali
2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021,
[42] Fast image captioning using LSTM
Meng Han
Wenyu Chen
Alemu Dagmawi Moges
Cluster Computing, 2019, 22 : 6143 - 6155
[43] Fast image captioning using LSTM
Han, Meng
Chen, Wenyu
Moges, Alemu Dagmawi
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S6143 - S6155
[44] Image Captioning Using Deep Learning
Adithya, Paluvayi Veera
Kalidindi, Mourya Viswanadh
Swaroop, Nallani Jyothi
Vishwas, H. N.
ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 42 - 58
[45] Japanese abstractive text summarization using BERT
Iwasaki, Yuuki
Yamashita, Akihiro
Konno, Yoko
Matsubayashi, Katsushi
2019 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2019,
[46] Turkish Medical Text Classification Using BERT
Celikten, Azer
Bulut, Hasan
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[47] Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs
Fu, Kun
Li, Jin
Jin, Junqi
Zhang, Changshui
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) : 5910 - 5921
[48] Multimodal NLP for image captioning : Fusing text and image modalities for accurate and informative descriptions
Tiwari, Manisha
Khare, Pragati
Saha, Ishani
Mali, Mahesh
JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2024, 45 (04): : 1041 - 1049
[49] Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Luol, Jianjie
Chen, Jingwen
Li, Yehao
Pan, Yingwei
Feng, Jianlin
Cha, Hongyang
Yao, Ting
COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 237 - 254
[50] Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning
Wang, Jing
Tang, Jinhui
Luo, Jiebo
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4346 - 4354

← 1 2 3 4 5 →