Textual Description Generation for Visual Content Using Neural Networks

被引:2
|
作者
Garg, Komal [1 ]
Singh, Varsha [1 ]
Tiwary, Uma Shanker [1 ]
机构
[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India
关键词
Convolutional Neural Network; Long Short-Term Memory; Bilingual Evaluation Understudy Score; AUTOMATIC IMAGE;
D O I
10.1007/978-3-030-98404-5_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various methods in machine learning have noticeable use in generating descriptive text for images and video frames and processing them. This area has attracted the immense interest of researchers in past years. For text generation, various models contain CNN and RNN combined approaches. RNN works well in language modeling; it lacks in maintaining information for a long time. An LSTM language model can overcome this drawback because of its long-term dependency handling. Here, the proposed methodology is an Encoder-Decoder approach where VGG19 Convolution Neural Network is working as Encoder; LSTM language model is working as Decoder to generate the sentence. The model is trained and tested on the Flickr8K dataset and can generate textual descriptions on a larger dataset Flickr30K with the slightest modifications. The results are generated using BLEU scores (Bilingual Evaluation Understudy Score). A GUI tool is developed to help in the field of child education. This tool generates audio for the generated textual description for images and helps to search for similar content on the internet.
引用
收藏
页码:16 / 26
页数:11
相关论文
共 50 条
  • [31] Visual and textual information fusion using Kernel method for content based image retrieval
    Unar, Salahuddin
    Wang, Xingyuan
    Zhang, Chuan
    INFORMATION FUSION, 2018, 44 : 176 - 187
  • [32] Automatic Generation of Description for Images Using Recurrent Neural Network
    Veena, G. S.
    Patil, Savitri
    Kumar, T. N. R.
    COMPUTING AND NETWORK SUSTAINABILITY, 2019, 75
  • [33] Extracting Textual Overlays from Social Media Videos Using Neural Networks
    Slucki, Adam
    Trzcinski, Tomasz
    Bielski, Adam
    Cyrta, Pawel
    COMPUTER VISION AND GRAPHICS ( ICCVG 2018), 2018, 11114 : 287 - 299
  • [34] Cross-Lingual Semantic Textual Similarity Modeling Using Neural Networks
    Li, Xia
    Chen, Minping
    Zeng, Zihang
    MACHINE TRANSLATION, CWMT 2018, 2019, 954 : 52 - 62
  • [35] Generation of Time-of-Use Tariffs for Demand Side Management using Artificial Neural Networks Poster Description
    Ahrens, Mischa
    Schmeck, Hartmut
    E-ENERGY'18: PROCEEDINGS OF THE 9TH ACM INTERNATIONAL CONFERENCE ON FUTURE ENERGY SYSTEMS, 2018, : 396 - 398
  • [36] Visual control of an autonomous vehicle using neural networks
    Chonnam Natl Univ, Kwang-ju, Korea, Republic of
    IECON Proc, (1064-1069):
  • [37] Visual control of an autonomous vehicle using neural networks
    Lim, YC
    Ryoo, YJ
    Park, JK
    Kim, ES
    Kim, TG
    Moon, CJ
    PROCEEDINGS OF THE 1996 IEEE IECON - 22ND INTERNATIONAL CONFERENCE ON INDUSTRIAL ELECTRONICS, CONTROL, AND INSTRUMENTATION, VOLS 1-3, 1996, : 1064 - 1069
  • [38] A Verifiable Visual Cryptography Scheme Using Neural Networks
    Deng Yuqiao
    Song Ge
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION APPLICATIONS (ICCIA 2012), 2012, : 113 - 116
  • [39] Visual robot detection in RoboCup using neural networks
    Kaufmann, U
    Mayer, G
    Kraetzschmar, G
    Palm, G
    ROBOCUP 2004: ROBOT SOCCER WORLD CUP VIII, 2005, 3276 : 262 - 273
  • [40] Visual Emotion Recognition Using Deep Neural Networks
    Iliev, Alexander I.
    Mote, Ameya
    DIGITAL PRESENTATION AND PRESERVATION OF CULTURAL AND SCIENTIFIC HERITAGE, 2022, 12 : 77 - 88