Textual Description Generation for Visual Content Using Neural Networks

被引：2

作者：

Garg, Komal ^{[1
]}

Singh, Varsha ^{[1
]}

Tiwary, Uma Shanker ^{[1
]}

机构：

[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India

来源：

INTELLIGENT HUMAN COMPUTER INTERACTION, IHCI 2021 | 2022年 / 13184卷

关键词：

Convolutional Neural Network; Long Short-Term Memory; Bilingual Evaluation Understudy Score; AUTOMATIC IMAGE;

D O I：

10.1007/978-3-030-98404-5_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Various methods in machine learning have noticeable use in generating descriptive text for images and video frames and processing them. This area has attracted the immense interest of researchers in past years. For text generation, various models contain CNN and RNN combined approaches. RNN works well in language modeling; it lacks in maintaining information for a long time. An LSTM language model can overcome this drawback because of its long-term dependency handling. Here, the proposed methodology is an Encoder-Decoder approach where VGG19 Convolution Neural Network is working as Encoder; LSTM language model is working as Decoder to generate the sentence. The model is trained and tested on the Flickr8K dataset and can generate textual descriptions on a larger dataset Flickr30K with the slightest modifications. The results are generated using BLEU scores (Bilingual Evaluation Understudy Score). A GUI tool is developed to help in the field of child education. This tool generates audio for the generated textual description for images and helps to search for similar content on the internet.

引用

页码：16 / 26

页数：11

共 50 条

[21] Visual and textual content based indexing and retrieval
Chabane Djeraba
Marinette Bouet
Henri Briand
Ali Khenchaf
International Journal on Digital Libraries, 2000, 2 (4) : 269 - 287
[22] Visual and Textual Sentiment Analysis of Brand-Related Social Media Pictures Using Deep Convolutional Neural Networks
Paolanti, Marina
Kaiser, Carolin
Schallner, Rene
Frontoni, Emanuele
Zingaretti, Primo
IMAGE ANALYSIS AND PROCESSING,(ICIAP 2017), PT I, 2017, 10484 : 402 - 413
[23] Video Description Using Bidirectional Recurrent Neural Networks
Peris, Alvaro
Bolanos, Marc
Radeva, Petia
Casacuberta, Francisco
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2016, PT II, 2016, 9887 : 3 - 11
[24] Automatic playlist generation using Convolutional Neural Networks and Recurrent Neural Networks
Irene, Rosilde Tatiana
Borrelli, Clara
Zanoni, Massimiliano
Buccoli, Michele
Sarti, Augusto
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[25] Facial Image Generation from Bangla Textual Description using DCGAN and Bangla FastText
Arnob, Noor Mairukh Khan
Rahman, Nakiba Nuren
Mahmud, Saiyara
Uddin, Md. Nahiyan
Rahman, Rashik
Saha, Aloke Kumar
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 1261 - 1271
[26] Texture generation using cellular neural networks
Debiec, Piotor
Kornatowski, Lukasz
Slot, Krzysztof
Kim, Hyongsuk
INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2006, 16 (12): : 3655 - 3668
[27] Crossing textual and visual content in different application scenarios
Ah-Pine, Julien
Bressan, Marco
Clinchant, Stephane
Csurka, Gabriela
Hoppenot, Yves
Renders, Jean-Michel
MULTIMEDIA TOOLS AND APPLICATIONS, 2009, 42 (01) : 31 - 56
[28] Crossing textual and visual content in different application scenarios
Julien Ah-Pine
Marco Bressan
Stephane Clinchant
Gabriela Csurka
Yves Hoppenot
Jean-Michel Renders
Multimedia Tools and Applications, 2009, 42 : 31 - 56
[29] A Hybrid Approach to Content Based Image Retrieval Using Visual Features and Textual Queries
Sudhakar, R.
Krishnan, K. Raghesh
Muthukrishnan, S.
2011 THIRD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2011, : 241 - 247
[30] Visual Coreference Resolution in Visual Dialog Using Neural Module Networks
Kottur, Satwik
Moura, Jose M. F.
Parikh, Devi
Batra, Dhruv
Rohrbach, Marcus
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 160 - 178

← 1 2 3 4 5 →