Semantic Enhanced Video Captioning with Multi-feature Fusion

被引:3
|
作者
Niu, Tian-Zi [1 ]
Dong, Shan-Shan [1 ]
Chen, Zhen-Duo [1 ]
Luo, Xin [1 ]
Guo, Shanqing [2 ]
Huang, Zi [3 ]
Xu, Xin-Shun [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China
[2] Shandong Univ, Sch Cyber Sci & Technol, Qingdao 266237, Peoples R China
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Australia
基金
中国国家自然科学基金;
关键词
Video captioning; semantic encoder; discrete selection; multi-feature fusion; NETWORK;
D O I
10.1145/3588572
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning aims to automatically describe a video clip with informative sentences. At present, deep learning-based models have become the mainstream for this task and achieved competitive results on public datasets. Usually, these methods leverage different types of features to generate sentences, e.g., semantic information, 2D or 3D features. However, some methods only treat semantic information as a complement of visual representations and cannot fully exploit it; some of them ignore the relationship between different types of features. In addition, most of them select multiple frames of a video with an equally spaced sampling scheme, resulting in much redundant information. To address these issues, we present a novel video-captioning framework, Semantic Enhanced video captioning with Multi-feature Fusion, SEMF for short. It optimizes the use of different types of features from three aspects. First, a semantic encoder is designed to enhance meaningful semantic features through a semantic dictionary to boost performance. Second, a discrete selection module pays attention to important features and obtains different contexts at different steps to reduce feature redundancy. Finally, a multi-feature fusionmodule uses a novel relation-aware attentionmechanism to separate the common and complementary components of different features to provide more effective visual features for the next step. Moreover, the entire framework can be trained in an end-to-endmanner. Extensive experiments are conducted on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets. The results demonstrate that SEMF is able to achieve state-of-the-art results.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] A Multi-Feature Fusion and SSAE-Based Deep Network for Image Semantic Recognition
    Li, Haifang
    Wang, Zhe
    Yin, Guimei
    Deng, Hongxia
    Yang, Xiaofeng
    Yao, Rong
    Gao, Peng
    Cao, Rui
    2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2019), 2019, : 322 - 327
  • [42] Joint of Multi-feature Fusion and Mutual Information-Based Image Semantic Annotation
    Zhu, Guiqian
    Liu, Chunping
    Zhang, Lingyan
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISKE 2013), 2014, 277 : 833 - 844
  • [43] Enhanced deep transfer learning with multi-feature fusion for lung disease detection
    Vidyasri, S.
    Saravanan, S.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023,
  • [44] Enhanced deep transfer learning with multi-feature fusion for lung disease detection
    Vidyasri, S.
    Saravanan, S.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 56321 - 56345
  • [45] MULTI-FEATURE FUSION BASED BACKGROUND SUBTRACTION FOR VIDEO SEQUENCES WITH STRONG BACKGROUND CHANGES
    Huang, Zhenkun
    Hu, Ruimin
    Thierry, Bouwmans
    Chen, Shihong
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3370 - 3374
  • [46] Multimodal feature fusion based on object relation for video captioning
    Yan, Zhiwen
    Chen, Ying
    Song, Jinlong
    Zhu, Jia
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (01) : 247 - 259
  • [47] Study on Video Based Fire Detection Algorithm using Multi-Feature Fusion Method
    Yang Manjiang
    Rang Jianzhong
    Wang Jian
    2011 3RD WORLD CONGRESS IN APPLIED COMPUTING, COMPUTER SCIENCE, AND COMPUTER ENGINEERING (ACC 2011), VOL 3, 2011, 3 : 207 - 213
  • [48] Video Flame Detection Based on Multi-feature Fusion and Double-layer XGBoost
    Wang, Yuanbin
    Li, Yujie
    Wu, Huaying
    Duan, Yu
    ENGINEERING LETTERS, 2022, 30 (02) : 904 - 911
  • [49] High-precision video flame detection algorithm based on multi-feature fusion
    Wang, Ying
    Li, Wen-Hui
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2010, 40 (03): : 769 - 775
  • [50] Long-Term Tracking Based on Multi-Feature Adaptive Fusion for Video Target
    Zhang, Hainan
    Sun, Yanjing
    Li, Song
    Shi, Wenjuan
    Feng, Chenglong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (05) : 1342 - 1349