Semantic Enhanced Video Captioning with Multi-feature Fusion

被引:3
|
作者
Niu, Tian-Zi [1 ]
Dong, Shan-Shan [1 ]
Chen, Zhen-Duo [1 ]
Luo, Xin [1 ]
Guo, Shanqing [2 ]
Huang, Zi [3 ]
Xu, Xin-Shun [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China
[2] Shandong Univ, Sch Cyber Sci & Technol, Qingdao 266237, Peoples R China
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Australia
基金
中国国家自然科学基金;
关键词
Video captioning; semantic encoder; discrete selection; multi-feature fusion; NETWORK;
D O I
10.1145/3588572
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning aims to automatically describe a video clip with informative sentences. At present, deep learning-based models have become the mainstream for this task and achieved competitive results on public datasets. Usually, these methods leverage different types of features to generate sentences, e.g., semantic information, 2D or 3D features. However, some methods only treat semantic information as a complement of visual representations and cannot fully exploit it; some of them ignore the relationship between different types of features. In addition, most of them select multiple frames of a video with an equally spaced sampling scheme, resulting in much redundant information. To address these issues, we present a novel video-captioning framework, Semantic Enhanced video captioning with Multi-feature Fusion, SEMF for short. It optimizes the use of different types of features from three aspects. First, a semantic encoder is designed to enhance meaningful semantic features through a semantic dictionary to boost performance. Second, a discrete selection module pays attention to important features and obtains different contexts at different steps to reduce feature redundancy. Finally, a multi-feature fusionmodule uses a novel relation-aware attentionmechanism to separate the common and complementary components of different features to provide more effective visual features for the next step. Moreover, the entire framework can be trained in an end-to-endmanner. Extensive experiments are conducted on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets. The results demonstrate that SEMF is able to achieve state-of-the-art results.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Video Smoke Detection Based on Multi-feature Fusion and Modified Random Forest
    Wang, Yuanbin
    Han, Qian
    Li, Yuanyuan
    Li, Yujie
    ENGINEERING LETTERS, 2021, 29 (03) : 1115 - 1122
  • [32] Video Flame Detection Algorithm Based on Improved GMM and Multi-Feature Fusion
    Zhang Chi
    Meng Qinghao
    Jing Tao
    LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (04)
  • [33] Smoke root detection from video sequences based on multi-feature fusion
    Liming Lou
    Feng Chen
    Pengle Cheng
    Ying Huang
    Journal of Forestry Research, 2022, 33 (06) : 1841 - 1856
  • [34] Pedestrian abnormal event detection based on multi-feature fusion in traffic video
    Wang, Xuan
    Song, Huansheng
    Cui, Hua
    OPTIK, 2018, 154 : 22 - 32
  • [35] Smoke root detection from video sequences based on multi-feature fusion
    Lou, Liming
    Chen, Feng
    Cheng, Pengle
    Huang, Ying
    JOURNAL OF FORESTRY RESEARCH, 2022, 33 (06) : 1841 - 1856
  • [36] Multi-feature fusion deep networks
    Ma, Gang
    Yang, Xi
    Zhang, Bo
    Shi, Zhongzhi
    NEUROCOMPUTING, 2016, 218 : 164 - 171
  • [37] Multi-Feature Enhanced Building Change Detection Based on Semantic Information Guidance
    Xue, Junkang
    Xu, Hao
    Yang, Hui
    Wang, Biao
    Wu, Penghai
    Choi, Jaewan
    Cai, Lixiao
    Wu, Yanlan
    REMOTE SENSING, 2021, 13 (20)
  • [38] A multi-feature fusion slam system attaching semantic in-variant to points and lines
    Li, Gang
    Zeng, Yawen
    Huang, Huilan
    Song, Shaojian
    Liu, Bin
    Liao, Xiang
    Sensors (Switzerland), 2021, 21 (04): : 1 - 20
  • [39] Improved Chinese Sentence Semantic Similarity Calculation Method Based on Multi-Feature Fusion
    Liu, Liqi
    Wang, Qinglin
    Li, Yuan
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2021, 25 (04) : 442 - 449
  • [40] Matching document pairs using multi-feature semantic fusion based on knowledge graph
    Chen Y.
    Zhang Z.
    Huang X.
    Xiang X.
    He Z.
    Zhongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Central South University (Science and Technology), 2023, 54 (08): : 3122 - 3131