A Novel Cross-Fusion Method of Different Types of Features for Image Captioning

被引:0
|
作者
Lou, Liangshan [1 ]
Lu, Ke [1 ,2 ]
Xue, Jian [1 ]
机构
[1] Univ Chinese Acad Sci, Sch Engn Sci, Beijing 100049, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China
来源
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Image Captioning; Region Features; Grid Features; Transformer; Cross-Fusion; LANGUAGE;
D O I
10.1109/IJCNN54540.2023.10191971
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal tasks are receiving more and more attention, including image captioning. Based on X-Linear attention, we simultaneously introduce grid features and region features extracted by Faster RCNN. We obtain a global feature vector of each type of original features through mean pooling. The two types of features are encoded by two parallel encoders. Each encoder has two inputs: a set of feature vectors (region/grid) and the corresponding global feature vector. Each encoding layer outputs an encoded global feature vector and a set of encoded feature vectors. We cross-fuse the global feature vector output by each encoding layer for region features and the set of encoded feature vectors for grid features. In the same way, we cross-fuse another pair of the global feature (grid) and the set of encoded feature vectors (region). Finally, we fuse the two global feature vectors output by the two encoders as the final global features, and the two sets of encoded feature vectors output by the two encoders as the final visual features. Experimental results on the COCO dataset show that our model achieves a new SOTA performance of BLEU-1 81.5%, BLEU-4 40.5%, METEOR 29.6%, and ROUGE 59.5% on the Karpathy test split.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Lightweight Image Super-resolution Reconstruction Based on Cross-fusion of Spatial Features
    Zhao X.
    Cheng W.
    Binggong Xuebao/Acta Armamentarii, 2024, 45 (04): : 1273 - 1284
  • [2] Multiscale cross-fusion network for hyperspectral image classification
    Pan, Haizhu
    Zhu, Yuexia
    Ge, Haimiao
    Liu, Moqi
    Shi, Cuiping
    EGYPTIAN JOURNAL OF REMOTE SENSING AND SPACE SCIENCES, 2023, 26 (03): : 839 - 850
  • [3] ACFNet: An adaptive cross-fusion network for infrared and visible image fusion
    Chen, Xiaoxuan
    Xu, Shuwen
    Hu, Shaohai
    Ma, Xiaole
    PATTERN RECOGNITION, 2025, 159
  • [4] Cross on Cross Attention: Deep Fusion Transformer for Image Captioning
    Zhang, Jing
    Xie, Yingshuai
    Ding, Weichao
    Wang, Zhe
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4257 - 4268
  • [5] CCFNet: Collaborative Cross-Fusion Network for Medical Image Segmentation
    Chen, Jialu
    Yuan, Baohua
    ALGORITHMS, 2024, 17 (04)
  • [6] HYPERSPECTRAL IMAGE DENOISING BASED ON PARALLEL CROSS-FUSION NETWORK
    Gong, Zhuoran
    Gao, Feng
    Dong, Junyu
    Qi, Lin
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1528 - 1531
  • [7] A New Cross-Fusion Method to Automatically Determine the Optimal Input Image Pairs for NDVI Spatiotemporal Data Fusion
    Chen, Yang
    Cao, Ruyin
    Chen, Jin
    Zhu, Xiaolin
    Zhou, Ji
    Wang, Guangpeng
    Shen, Miaogen
    Chen, Xuehong
    Yang, Wei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (07): : 5179 - 5194
  • [8] Pairwise Guided Multilayer Cross-Fusion Network for Bird Image Recognition
    Lei, Jingsheng
    Jin, Yao
    Huang, Liya
    Ji, Yuan
    Yang, Shengying
    ELECTRONICS, 2023, 12 (18)
  • [9] Pulmonary Nodule Computed Tomography Image Classification Method Based on Dual-Path Cross-Fusion Network
    Yang Ping
    Zhang Xin
    Wen Fan
    Tian Ji
    He Ning
    LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (08)
  • [10] Triplet Cross-Fusion Learning for Unpaired Image Denoising in Optical Coherence Tomography
    Geng, Mufeng
    Meng, Xiangxi
    Zhu, Lei
    Jiang, Zhe
    Gao, Mengdi
    Huang, Zhiyu
    Qiu, Bin
    Hu, Yicheng
    Zhang, Yibao
    Ren, Qiushi
    Lu, Yanye
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (11) : 3357 - 3372