Textual Description Generation for Visual Content Using Neural Networks

被引:2
|
作者
Garg, Komal [1 ]
Singh, Varsha [1 ]
Tiwary, Uma Shanker [1 ]
机构
[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India
关键词
Convolutional Neural Network; Long Short-Term Memory; Bilingual Evaluation Understudy Score; AUTOMATIC IMAGE;
D O I
10.1007/978-3-030-98404-5_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various methods in machine learning have noticeable use in generating descriptive text for images and video frames and processing them. This area has attracted the immense interest of researchers in past years. For text generation, various models contain CNN and RNN combined approaches. RNN works well in language modeling; it lacks in maintaining information for a long time. An LSTM language model can overcome this drawback because of its long-term dependency handling. Here, the proposed methodology is an Encoder-Decoder approach where VGG19 Convolution Neural Network is working as Encoder; LSTM language model is working as Decoder to generate the sentence. The model is trained and tested on the Flickr8K dataset and can generate textual descriptions on a larger dataset Flickr30K with the slightest modifications. The results are generated using BLEU scores (Bilingual Evaluation Understudy Score). A GUI tool is developed to help in the field of child education. This tool generates audio for the generated textual description for images and helps to search for similar content on the internet.
引用
收藏
页码:16 / 26
页数:11
相关论文
共 50 条
  • [41] Visual representation of the speech trace using neural networks
    Gomez, P
    Rodellar, V
    Alvarez, A
    Mayo, N
    Rubio, F
    Nieto, V
    Perez, MM
    ISCAS 96: 1996 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - CIRCUITS AND SYSTEMS CONNECTING THE WORLD, VOL 3, 1996, : 586 - 589
  • [42] Towards Automatic Job Description Generation With Capability-Aware Neural Networks
    Qin, Chuan
    Yao, Kaichun
    Zhu, Hengshu
    Xu, Tong
    Shen, Dazhong
    Chen, Enhong
    Xiong, Hui
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 5341 - 5355
  • [43] Attention-Based Bidirectional Recurrent Neural Networks for Description Generation of Videos
    Du, Xiaotong
    Yuan, Jiabin
    Liu, Hu
    CLOUD COMPUTING AND SECURITY, PT VI, 2018, 11068 : 440 - 451
  • [44] Video Content Analysis using Convolutional Neural Networks
    Aljarrah, Inad
    Mohammad, Duaa
    2018 9TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2018, : 122 - 126
  • [45] Estimation of fuel moisture content using neural networks
    Riaño, D
    Ustin, SL
    Usero, L
    Patricio, MA
    ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING APPLICATIONS: A BIOINSPIRED APPROACH, PT 2, PROCEEDINGS, 2005, 3562 : 489 - 498
  • [46] Automatic Generation of Visual-Textual Presentation Layout
    Yang, Xuyong
    Mei, Tao
    Xu, Ying-Qing
    Rui, Yong
    Li, Shipeng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2016, 12 (02)
  • [47] Role-aware Interaction Generation from Textual Description
    Tanaka, Mikihiro
    Fujiwara, Kent
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15953 - 15963
  • [48] Exploring the benefits of images with frequency visual content in predicting human ocular scanpaths using Artificial Neural Networks
    Do Nascimento, Camilo Jara
    Orchard, Marcos E.
    Devia, Christ
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 239
  • [49] Recognition of silhouettes of objects using a textual description
    Aouat, Saliha
    Larabi, Slimane
    2008 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-4, 2008, : 149 - 152
  • [50] Dynamic Memory Networks for Visual and Textual Question Answering
    Xiong, Caiming
    Merity, Stephen
    Socher, Richard
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48