An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi

被引:3
|
作者
Mishra, Santosh Kumar [1 ]
Peethala, Mahesh Babu [1 ]
Saha, Sriparna [1 ]
Bhattacharyya, Pushpak [2 ]
机构
[1] Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna, Bihar, India
[2] Indian Inst Technol, Mumbai, Maharashtra, India
关键词
D O I
10.1109/SMC52423.2021.9658859
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning is a multi-modal problem linking computer vision and natural language processing, which combines image analysis and text generation challenges. In the literature, most of the image captioning works have been accomplished in the English language only. This paper proposes a new approach for image captioning in the Hindi language using deep learning-based encoder-decoder architecture. Hindi, widely spoken in India and South Asia, is the fourth most spoken language globally; it is India's official language. In recent years, significant advancement has been made in image captioning, utilizing encoder-decoder architectures based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Encoder CNN extracts features from input images, whereas decoder RNN performs language modeling. The proposed encoder-decoder architecture utilizes information multiplexing in the encoder CNN to achieve a performance gain in feature extraction. Extensive experimentation is carried out on the benchmark MSCOCO Hindi dataset, and significant improvements in BLEU score are reported compared to the baselines. Manual human evaluation in terms of adequacy and fluency of the generated captions further establishes the proposed method's efficacy in generating good quality captions.
引用
收藏
页码:3019 / 3024
页数:6
相关论文
共 50 条
  • [41] Image Captioning using Vision Encoder Decoder Model
    Abdelaal, Ahmad
    ELshafey, Nadeen Farid
    Abdalah, Nadine Walid
    Shaaban, Nouran Hady
    Okasha, Sama Ahmed
    Yasser, Tawfik
    Fathi, Mostafa
    Fouad, Khaled M.
    Abdelbaky, Ibrahim
    2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND SMART INNOVATION, ICMISI 2024, 2024, : 101 - 106
  • [42] Encoder-Decoder with Multi-scale Information Fusion for Semantic Image Segmentation
    Ma, Xinxin
    Liu, Kai
    Ding, Chongyang
    Yan, Lin
    Duan, Meiyu
    ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373
  • [43] An encoder-decoder switch network for purchase prediction
    Park, Chanyoung
    Kim, Donghyun
    Yu, Hwanjo
    KNOWLEDGE-BASED SYSTEMS, 2019, 185
  • [44] Encoder-decoder network with RMP for tongue segmentation
    Worapan Kusakunniran
    Punyanuch Borwarnginn
    Sarattha Karnjanapreechakorn
    Kittikhun Thongkanchorn
    Panrasee Ritthipravat
    Pimchanok Tuakta
    Paitoon Benjapornlert
    Medical & Biological Engineering & Computing, 2023, 61 : 1193 - 1207
  • [45] Laplacian encoder-decoder network for raindrop removal
    Zini, Simone
    Buzzelli, Marco
    PATTERN RECOGNITION LETTERS, 2022, 158 : 24 - 33
  • [46] Internal and external transmission encoder-decoder network for single-image deraining
    Xu, Yingcheng
    Han, Congwei
    Lv, Shuqi
    Wang, Ze
    Wang, Miao
    VISUAL COMPUTER, 2024, 40 (12): : 8653 - 8663
  • [47] Manipulating Retinal OCT data for Image Segmentation based on Encoder-Decoder Network
    Song, Mingue
    Kim, Yanggon
    PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
  • [48] The local ternary pattern encoder-decoder neural network for dental image segmentation
    Salih, Omran
    Duffy, Kevin Jan
    IET IMAGE PROCESSING, 2022, 16 (06) : 1520 - 1530
  • [49] Dual Encoder-Decoder Network for Land Cover Segmentation of Remote Sensing Image
    Wang, Zhongchen
    Xia, Min
    Weng, Liguo
    Hu, Kai
    Lin, Haifeng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 2372 - 2385
  • [50] HYPERSPECTRAL IMAGE CLASSIFICATION VIA MULTI-SCALE ENCODER-DECODER NETWORK
    Ma, Jingjing
    Wu, Linlin
    Tang, Xu
    Zhang, Xiangrong
    Zhu, Cheng
    Ma, Junyong
    Jiao, Licheng
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 1283 - 1286