Textual Description Generation for Visual Content Using Neural Networks

被引:2
|
作者
Garg, Komal [1 ]
Singh, Varsha [1 ]
Tiwary, Uma Shanker [1 ]
机构
[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India
关键词
Convolutional Neural Network; Long Short-Term Memory; Bilingual Evaluation Understudy Score; AUTOMATIC IMAGE;
D O I
10.1007/978-3-030-98404-5_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various methods in machine learning have noticeable use in generating descriptive text for images and video frames and processing them. This area has attracted the immense interest of researchers in past years. For text generation, various models contain CNN and RNN combined approaches. RNN works well in language modeling; it lacks in maintaining information for a long time. An LSTM language model can overcome this drawback because of its long-term dependency handling. Here, the proposed methodology is an Encoder-Decoder approach where VGG19 Convolution Neural Network is working as Encoder; LSTM language model is working as Decoder to generate the sentence. The model is trained and tested on the Flickr8K dataset and can generate textual descriptions on a larger dataset Flickr30K with the slightest modifications. The results are generated using BLEU scores (Bilingual Evaluation Understudy Score). A GUI tool is developed to help in the field of child education. This tool generates audio for the generated textual description for images and helps to search for similar content on the internet.
引用
收藏
页码:16 / 26
页数:11
相关论文
共 50 条
  • [21] Visual and textual content based indexing and retrieval
    Chabane Djeraba
    Marinette Bouet
    Henri Briand
    Ali Khenchaf
    International Journal on Digital Libraries, 2000, 2 (4) : 269 - 287
  • [22] Visual and Textual Sentiment Analysis of Brand-Related Social Media Pictures Using Deep Convolutional Neural Networks
    Paolanti, Marina
    Kaiser, Carolin
    Schallner, Rene
    Frontoni, Emanuele
    Zingaretti, Primo
    IMAGE ANALYSIS AND PROCESSING,(ICIAP 2017), PT I, 2017, 10484 : 402 - 413
  • [23] Video Description Using Bidirectional Recurrent Neural Networks
    Peris, Alvaro
    Bolanos, Marc
    Radeva, Petia
    Casacuberta, Francisco
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2016, PT II, 2016, 9887 : 3 - 11
  • [24] Automatic playlist generation using Convolutional Neural Networks and Recurrent Neural Networks
    Irene, Rosilde Tatiana
    Borrelli, Clara
    Zanoni, Massimiliano
    Buccoli, Michele
    Sarti, Augusto
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [25] Facial Image Generation from Bangla Textual Description using DCGAN and Bangla FastText
    Arnob, Noor Mairukh Khan
    Rahman, Nakiba Nuren
    Mahmud, Saiyara
    Uddin, Md. Nahiyan
    Rahman, Rashik
    Saha, Aloke Kumar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 1261 - 1271
  • [26] Texture generation using cellular neural networks
    Debiec, Piotor
    Kornatowski, Lukasz
    Slot, Krzysztof
    Kim, Hyongsuk
    INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2006, 16 (12): : 3655 - 3668
  • [27] Crossing textual and visual content in different application scenarios
    Ah-Pine, Julien
    Bressan, Marco
    Clinchant, Stephane
    Csurka, Gabriela
    Hoppenot, Yves
    Renders, Jean-Michel
    MULTIMEDIA TOOLS AND APPLICATIONS, 2009, 42 (01) : 31 - 56
  • [28] Crossing textual and visual content in different application scenarios
    Julien Ah-Pine
    Marco Bressan
    Stephane Clinchant
    Gabriela Csurka
    Yves Hoppenot
    Jean-Michel Renders
    Multimedia Tools and Applications, 2009, 42 : 31 - 56
  • [29] A Hybrid Approach to Content Based Image Retrieval Using Visual Features and Textual Queries
    Sudhakar, R.
    Krishnan, K. Raghesh
    Muthukrishnan, S.
    2011 THIRD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2011, : 241 - 247
  • [30] Visual Coreference Resolution in Visual Dialog Using Neural Module Networks
    Kottur, Satwik
    Moura, Jose M. F.
    Parikh, Devi
    Batra, Dhruv
    Rohrbach, Marcus
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 160 - 178