Semantic similarity information discrimination for video captioning

被引:3
|
作者
Du, Sen [1 ]
Zhu, Hong [1 ]
Xiong, Ge [1 ]
Lin, Guangfeng [2 ]
Wang, Dong [1 ]
Shi, Jing [1 ]
Wang, Jing [2 ]
Xing, Nan [1 ]
机构
[1] Xian Univ Technol, Sch Automation & Informat Engn, 5 South Jinhua Rd, Xian 710048, Shaanxi, Peoples R China
[2] Xian Univ Technol, Informat Sci Dept, 5 South Jinhua Rd, Xian 710048, Shaanxi, Peoples R China
关键词
Video captioning; Semantic detection; Bilinear pooling; Channel attention; Natural language processing; NETWORK;
D O I
10.1016/j.eswa.2022.118985
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning is a task that aims to automatically describe objects and their actions in videos using natural language sentences. The correct understanding of vision and language information is critical for video captioning tasks. Many existing methods usually fuse different features to generate sentences. However, the sentences have many improper nouns and verbs. Inspired by the successes of fine-grained visual recognition, we treat the problem of improper words to discriminate semantic similarity information. In this paper, we designed a semantic bilinear block (SBB) to widen the gap between the probability of existing and nonexistent words, which can capture more fine-grained features to discriminate semantic information. Moreover, our designed linear attention block (LAB) implements the channelwise attention for the 1-D feature by simplifying the squeeze-and-excitation structure. Furthermore, we designed a semantic discrimination network (SDN) that integrates the LAB and SBB into video encoder and decoder to leverage successful channelwise attention and discriminate semantic similarity information for better video captioning. Experiments on two widely used datasets, MSVD and MSR-VTT, demonstrate that our proposed SDN can achieve better performance than state-of-the-art methods.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
    Lei, Zhou
    Huang, Yiyong
    FUTURE INTERNET, 2021, 13 (02) : 1 - 18
  • [32] Video Captioning With Attention-Based LSTM and Semantic Consistency
    Gao, Lianli
    Guo, Zhao
    Zhang, Hanwang
    Xu, Xing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (09) : 2045 - 2055
  • [33] Semantic Similarity Based Video Retrieval
    Jung, Min Young
    Park, Sung Han
    NEW DIRECTIONS IN INTELLIGENT INTERACTIVE MULTIMEDIA SYSTEMS AND SERVICES - 2, 2009, 226 : 381 - 390
  • [34] Information retrieval by semantic similarity
    Hliaoutakis, Angelos
    Varelas, Giannis
    Voutsakis, Epimenidis
    Petrakis, Euripides G. M.
    Milios, Evangelos
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2006, 2 (03) : 55 - 73
  • [35] Video captioning algorithm based on mixed training and semantic association
    Chen, Shuqin
    Zhong, Xian
    Huang, Wenxin
    Lu, Yansheng
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2023, 51 (11): : 67 - 74
  • [36] Semantic Enhanced Video Captioning with Multi-feature Fusion
    Niu, Tian-Zi
    Dong, Shan-Shan
    Chen, Zhen-Duo
    Luo, Xin
    Guo, Shanqing
    Huang, Zi
    Xu, Xin-Shun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [37] Learning topic emotion and logical semantic for video paragraph captioning
    Li, Qinyu
    Wang, Hanli
    Yi, Xiaokai
    DISPLAYS, 2024, 83
  • [38] Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
    Lu, Yifan
    Zhang, Ziqi
    Yuan, Chunfeng
    Li, Peng
    Wang, Yan
    Li, Bing
    Hu, Weiming
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3909 - 3917
  • [39] Fused GRU with semantic-temporal attention for video captioning
    Gao, Lianli
    Wang, Xuanhan
    Song, Jingkuan
    Liu, Yang
    NEUROCOMPUTING, 2020, 395 : 222 - 228
  • [40] Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
    Zhong, Xian
    Li, Zipeng
    Chen, Shuqin
    Jiang, Kui
    Chen, Chen
    Ye, Mang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3724 - 3732