Textual Description Generation for Visual Content Using Neural Networks

被引：2

作者：

Garg, Komal ^{[1
]}

Singh, Varsha ^{[1
]}

Tiwary, Uma Shanker ^{[1
]}

机构：

[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India

来源：

INTELLIGENT HUMAN COMPUTER INTERACTION, IHCI 2021 | 2022年 / 13184卷

关键词：

Convolutional Neural Network; Long Short-Term Memory; Bilingual Evaluation Understudy Score; AUTOMATIC IMAGE;

D O I：

10.1007/978-3-030-98404-5_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Various methods in machine learning have noticeable use in generating descriptive text for images and video frames and processing them. This area has attracted the immense interest of researchers in past years. For text generation, various models contain CNN and RNN combined approaches. RNN works well in language modeling; it lacks in maintaining information for a long time. An LSTM language model can overcome this drawback because of its long-term dependency handling. Here, the proposed methodology is an Encoder-Decoder approach where VGG19 Convolution Neural Network is working as Encoder; LSTM language model is working as Decoder to generate the sentence. The model is trained and tested on the Flickr8K dataset and can generate textual descriptions on a larger dataset Flickr30K with the slightest modifications. The results are generated using BLEU scores (Bilingual Evaluation Understudy Score). A GUI tool is developed to help in the field of child education. This tool generates audio for the generated textual description for images and helps to search for similar content on the internet.

引用

页码：16 / 26

页数：11

共 50 条

[1] Visual content generation from textual description using improved adversarial network
Singh, Varsha
Tiwary, Uma Shanker
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 10943 - 10960
[2] Visual content generation from textual description using improved adversarial network
Varsha Singh
Uma Shanker Tiwary
Multimedia Tools and Applications, 2023, 82 : 10943 - 10960
[3] Sentiment analysis in textual, visual and multimodal inputs using recurrent neural networks
Jitendra V. Tembhurne
Tausif Diwan
Multimedia Tools and Applications, 2021, 80 : 6871 - 6910
[4] VISUAL AND TEXTUAL SENTIMENT ANALYSIS USING DEEP FUSION CONVOLUTIONAL NEURAL NETWORKS
Chen, Xingyue
Wang, Yunhong
Liu, Qingjie
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1557 - 1561
[5] Sentiment analysis in textual, visual and multimodal inputs using recurrent neural networks
Tembhurne, Jitendra V.
Diwan, Tausif
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (05) : 6871 - 6910
[6] Visual and Textual Sentiment Analysis of a Microblog Using Deep Convolutional Neural Networks
Yu, Yuhai
Lin, Hongfei
Meng, Jiana
Zhao, Zhehuan
ALGORITHMS, 2016, 9 (02)
[7] Prediction Sentiment Polarity using Past Textual Content and CNN-LSTM Neural Networks
Belhareth, Yassin
Latiri, Chiraz
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES (WEBIST), 2021, : 242 - 249
[8] Coherent Visual Description of Textual Instructions
Mujumdar, Shashank
Gupta, Nitin
Jain, Abhinav
Mehta, Sameep
2017 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2017, : 122 - 129
[9] Joint Visual-Textual Sentiment Analysis with Deep Neural Networks
You, Quanzeng
Luo, Jiebo
Jin, Hailin
Yang, Jianchao
MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1071 - 1074
[10] Semantic indexing of multimedia content using textual and visual information
Amrane, A. (amrane@mail.cerist.dz), 1600, Inderscience Enterprises Ltd. (05): : 2 - 3

← 1 2 3 4 5 →