Boosting image caption generation with feature fusion module

被引:12
|
作者
Xia, Pengfei [1 ]
He, Jingsong [1 ]
Yin, Jin [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
关键词
Image caption; Feature fusion module; Encoder-decoder model; ATTENTION; MODELS;
D O I
10.1007/s11042-020-09110-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image caption generation has been considered as a key issue on vision-to-language tasks. Using the classification model, such as AlexNet, VGG and ResNet as the encoder to extract image features is very common in previous work. However, there is an explicit gap in image feature requirements between caption task and classification task, and has not been widely concerned. In this paper, we propose a novel custom structure, named feature fusion module (FFM), to make the features extracted by the encoder more suitable for caption task. We evaluate the proposed module with two typical models, NIC (Neural Image Caption) and SA (Soft Attention), on two popular benchmarks, MS COCO and Flickr30k. It is consistently observed that FFM is able to boost the performance, and outperforms state-of-the-art methods over five metrics.
引用
收藏
页码:24225 / 24239
页数:15
相关论文
共 50 条
  • [31] Topic-Based Image Caption Generation
    Sandeep Kumar Dash
    Shantanu Acharya
    Partha Pakray
    Ranjita Das
    Alexander Gelbukh
    Arabian Journal for Science and Engineering, 2020, 45 : 3025 - 3034
  • [32] Topic-Specific Image Caption Generation
    Zhou, Chang
    Mao, Yuzhao
    Wang, Xiaojie
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 321 - 332
  • [33] Image caption generation with high-level image features
    Ding, Songtao
    Qu, Shiru
    Xi, Yuling
    Sangaiah, Arun Kumar
    Wan, Shaohua
    PATTERN RECOGNITION LETTERS, 2019, 123 : 89 - 95
  • [34] AACR: Feature Fusion Effects of Algebraic Amalgamation Composed Representation on (De)Compositional Network for Caption Generation for Images
    Sur C.
    SN Computer Science, 2020, 1 (4)
  • [35] LitefusionNet: Boosting the performance for medical image classification with an intelligent and lightweight feature fusion network
    Asif, Sohaib
    Qurrat-ul Ain
    Al-Sabri, Raeed
    Abdullah, Monir
    JOURNAL OF COMPUTATIONAL SCIENCE, 2024, 80
  • [36] Automatic image caption generation using deep learning
    Verma, Akash
    Yadav, Arun Kumar
    Kumar, Mohit
    Yadav, Divakar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 5309 - 5325
  • [37] Transformer based image caption generation for news articles
    Pande, Ashtavinayak
    Pandey, Atul
    Solanki, Ayush
    Shanbhag, Chinmay
    Motghare, Manish
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01):
  • [38] Bahdanau Attention Based Bengali Image Caption Generation
    Alam, Md Sahrial
    Rahman, Md Sayedur
    Hosen, Md Ikbal
    Mubin, Khairul Anam
    Hossen, Sharif
    Mridha, M. F.
    2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 1073 - 1077
  • [39] Image Caption Generation with Local Semantic and Global Information
    Liu, Xing
    Liu, Weibin
    Xing, Weiwei
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 680 - 685
  • [40] Deep Neural Networks for Efficient Image Caption Generation
    Rai, Riddhi
    Guruprasad, Navya Shimoga
    Tumuluru, Shreya Sindhu
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT II, 2024, 2091 : 247 - 260