Textual Description Generation for Visual Content Using Neural Networks

被引:2
|
作者
Garg, Komal [1 ]
Singh, Varsha [1 ]
Tiwary, Uma Shanker [1 ]
机构
[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India
关键词
Convolutional Neural Network; Long Short-Term Memory; Bilingual Evaluation Understudy Score; AUTOMATIC IMAGE;
D O I
10.1007/978-3-030-98404-5_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various methods in machine learning have noticeable use in generating descriptive text for images and video frames and processing them. This area has attracted the immense interest of researchers in past years. For text generation, various models contain CNN and RNN combined approaches. RNN works well in language modeling; it lacks in maintaining information for a long time. An LSTM language model can overcome this drawback because of its long-term dependency handling. Here, the proposed methodology is an Encoder-Decoder approach where VGG19 Convolution Neural Network is working as Encoder; LSTM language model is working as Decoder to generate the sentence. The model is trained and tested on the Flickr8K dataset and can generate textual descriptions on a larger dataset Flickr30K with the slightest modifications. The results are generated using BLEU scores (Bilingual Evaluation Understudy Score). A GUI tool is developed to help in the field of child education. This tool generates audio for the generated textual description for images and helps to search for similar content on the internet.
引用
收藏
页码:16 / 26
页数:11
相关论文
共 50 条
  • [1] Visual content generation from textual description using improved adversarial network
    Singh, Varsha
    Tiwary, Uma Shanker
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 10943 - 10960
  • [2] Visual content generation from textual description using improved adversarial network
    Varsha Singh
    Uma Shanker Tiwary
    Multimedia Tools and Applications, 2023, 82 : 10943 - 10960
  • [3] Sentiment analysis in textual, visual and multimodal inputs using recurrent neural networks
    Jitendra V. Tembhurne
    Tausif Diwan
    Multimedia Tools and Applications, 2021, 80 : 6871 - 6910
  • [4] VISUAL AND TEXTUAL SENTIMENT ANALYSIS USING DEEP FUSION CONVOLUTIONAL NEURAL NETWORKS
    Chen, Xingyue
    Wang, Yunhong
    Liu, Qingjie
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1557 - 1561
  • [5] Sentiment analysis in textual, visual and multimodal inputs using recurrent neural networks
    Tembhurne, Jitendra V.
    Diwan, Tausif
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (05) : 6871 - 6910
  • [6] Visual and Textual Sentiment Analysis of a Microblog Using Deep Convolutional Neural Networks
    Yu, Yuhai
    Lin, Hongfei
    Meng, Jiana
    Zhao, Zhehuan
    ALGORITHMS, 2016, 9 (02)
  • [7] Prediction Sentiment Polarity using Past Textual Content and CNN-LSTM Neural Networks
    Belhareth, Yassin
    Latiri, Chiraz
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES (WEBIST), 2021, : 242 - 249
  • [8] Coherent Visual Description of Textual Instructions
    Mujumdar, Shashank
    Gupta, Nitin
    Jain, Abhinav
    Mehta, Sameep
    2017 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2017, : 122 - 129
  • [9] Joint Visual-Textual Sentiment Analysis with Deep Neural Networks
    You, Quanzeng
    Luo, Jiebo
    Jin, Hailin
    Yang, Jianchao
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1071 - 1074
  • [10] Semantic indexing of multimedia content using textual and visual information
    Amrane, A. (amrane@mail.cerist.dz), 1600, Inderscience Enterprises Ltd. (05): : 2 - 3