Boosting image caption generation with feature fusion module

被引:12
|
作者
Xia, Pengfei [1 ]
He, Jingsong [1 ]
Yin, Jin [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
关键词
Image caption; Feature fusion module; Encoder-decoder model; ATTENTION; MODELS;
D O I
10.1007/s11042-020-09110-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image caption generation has been considered as a key issue on vision-to-language tasks. Using the classification model, such as AlexNet, VGG and ResNet as the encoder to extract image features is very common in previous work. However, there is an explicit gap in image feature requirements between caption task and classification task, and has not been widely concerned. In this paper, we propose a novel custom structure, named feature fusion module (FFM), to make the features extracted by the encoder more suitable for caption task. We evaluate the proposed module with two typical models, NIC (Neural Image Caption) and SA (Soft Attention), on two popular benchmarks, MS COCO and Flickr30k. It is consistently observed that FFM is able to boost the performance, and outperforms state-of-the-art methods over five metrics.
引用
收藏
页码:24225 / 24239
页数:15
相关论文
共 50 条
  • [1] Boosting image caption generation with feature fusion module
    Pengfei Xia
    Jingsong He
    Jin Yin
    Multimedia Tools and Applications, 2020, 79 : 24225 - 24239
  • [2] FFGS: Feature Fusion with Gating Structure for Image Caption Generation
    Yuan, Aihong
    Li, Xuelong
    Lu, Xiaoqiang
    COMPUTER VISION, PT I, 2017, 771 : 638 - 649
  • [3] Image-Caption Model Based on Fusion Feature
    Geng, Yaogang
    Mei, Hongyan
    Xue, Xiaorong
    Zhang, Xing
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [4] Image Caption Automatic Generation Method Based on Weighted Feature
    Xi, Su Mei
    Cho, Young Im
    2013 13TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2013), 2013, : 548 - 551
  • [5] Neural Image Caption Generation with Global Feature Based Attention Scheme
    Wang, Yongzhuang
    Xiong, Hongkai
    IMAGE AND GRAPHICS (ICIG 2017), PT II, 2017, 10667 : 51 - 61
  • [6] A NOVEL SEMANTIC ATTRIBUTE-BASED FEATURE FOR IMAGE CAPTION GENERATION
    Wang, Wei
    Ding, Yuxuan
    Tian, Chunna
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3081 - 3085
  • [7] TVPRNN for image caption generation
    Yang, Liang
    Hu, Haifeng
    ELECTRONICS LETTERS, 2017, 53 (22) : 1471 - +
  • [8] CNN image caption generation
    Li Y.
    Cheng H.
    Liang X.
    Guo Q.
    Qian Y.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (02): : 152 - 157
  • [9] Image steganalysis based on feature fusion by improved boosting feature selection algorithm
    Zhang, M.-Q. (api_zmq@126.com), 1600, Board of Optronics Lasers (25):
  • [10] Image caption generation method based on an interaction mechanism and scene concept selection module
    Zhang, Liping
    Lu, Qin
    2021 IEEE 32ND INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2021), 2021, : 141 - 148