Textual Description Generation for Visual Content Using Neural Networks

被引：2

作者：

Garg, Komal ^{[1
]}

Singh, Varsha ^{[1
]}

Tiwary, Uma Shanker ^{[1
]}

机构：

[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India

来源：

INTELLIGENT HUMAN COMPUTER INTERACTION, IHCI 2021 | 2022年 / 13184卷

关键词：

Convolutional Neural Network; Long Short-Term Memory; Bilingual Evaluation Understudy Score; AUTOMATIC IMAGE;

D O I：

10.1007/978-3-030-98404-5_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Various methods in machine learning have noticeable use in generating descriptive text for images and video frames and processing them. This area has attracted the immense interest of researchers in past years. For text generation, various models contain CNN and RNN combined approaches. RNN works well in language modeling; it lacks in maintaining information for a long time. An LSTM language model can overcome this drawback because of its long-term dependency handling. Here, the proposed methodology is an Encoder-Decoder approach where VGG19 Convolution Neural Network is working as Encoder; LSTM language model is working as Decoder to generate the sentence. The model is trained and tested on the Flickr8K dataset and can generate textual descriptions on a larger dataset Flickr30K with the slightest modifications. The results are generated using BLEU scores (Bilingual Evaluation Understudy Score). A GUI tool is developed to help in the field of child education. This tool generates audio for the generated textual description for images and helps to search for similar content on the internet.

引用

页码：16 / 26

页数：11

共 50 条

[41] Visual representation of the speech trace using neural networks
Gomez, P
Rodellar, V
Alvarez, A
Mayo, N
Rubio, F
Nieto, V
Perez, MM
ISCAS 96: 1996 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - CIRCUITS AND SYSTEMS CONNECTING THE WORLD, VOL 3, 1996, : 586 - 589
[42] Towards Automatic Job Description Generation With Capability-Aware Neural Networks
Qin, Chuan
Yao, Kaichun
Zhu, Hengshu
Xu, Tong
Shen, Dazhong
Chen, Enhong
Xiong, Hui
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 5341 - 5355
[43] Attention-Based Bidirectional Recurrent Neural Networks for Description Generation of Videos
Du, Xiaotong
Yuan, Jiabin
Liu, Hu
CLOUD COMPUTING AND SECURITY, PT VI, 2018, 11068 : 440 - 451
[44] Video Content Analysis using Convolutional Neural Networks
Aljarrah, Inad
Mohammad, Duaa
2018 9TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2018, : 122 - 126
[45] Estimation of fuel moisture content using neural networks
Riaño, D
Ustin, SL
Usero, L
Patricio, MA
ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING APPLICATIONS: A BIOINSPIRED APPROACH, PT 2, PROCEEDINGS, 2005, 3562 : 489 - 498
[46] Automatic Generation of Visual-Textual Presentation Layout
Yang, Xuyong
Mei, Tao
Xu, Ying-Qing
Rui, Yong
Li, Shipeng
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2016, 12 (02)
[47] Role-aware Interaction Generation from Textual Description
Tanaka, Mikihiro
Fujiwara, Kent
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15953 - 15963
[48] Exploring the benefits of images with frequency visual content in predicting human ocular scanpaths using Artificial Neural Networks
Do Nascimento, Camilo Jara
Orchard, Marcos E.
Devia, Christ
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 239
[49] Recognition of silhouettes of objects using a textual description
Aouat, Saliha
Larabi, Slimane
2008 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-4, 2008, : 149 - 152
[50] Dynamic Memory Networks for Visual and Textual Question Answering
Xiong, Caiming
Merity, Stephen
Socher, Richard
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48

← 1 2 3 4 5 →