Semantic similarity information discrimination for video captioning

被引:3
|
作者
Du, Sen [1 ]
Zhu, Hong [1 ]
Xiong, Ge [1 ]
Lin, Guangfeng [2 ]
Wang, Dong [1 ]
Shi, Jing [1 ]
Wang, Jing [2 ]
Xing, Nan [1 ]
机构
[1] Xian Univ Technol, Sch Automation & Informat Engn, 5 South Jinhua Rd, Xian 710048, Shaanxi, Peoples R China
[2] Xian Univ Technol, Informat Sci Dept, 5 South Jinhua Rd, Xian 710048, Shaanxi, Peoples R China
关键词
Video captioning; Semantic detection; Bilinear pooling; Channel attention; Natural language processing; NETWORK;
D O I
10.1016/j.eswa.2022.118985
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning is a task that aims to automatically describe objects and their actions in videos using natural language sentences. The correct understanding of vision and language information is critical for video captioning tasks. Many existing methods usually fuse different features to generate sentences. However, the sentences have many improper nouns and verbs. Inspired by the successes of fine-grained visual recognition, we treat the problem of improper words to discriminate semantic similarity information. In this paper, we designed a semantic bilinear block (SBB) to widen the gap between the probability of existing and nonexistent words, which can capture more fine-grained features to discriminate semantic information. Moreover, our designed linear attention block (LAB) implements the channelwise attention for the 1-D feature by simplifying the squeeze-and-excitation structure. Furthermore, we designed a semantic discrimination network (SDN) that integrates the LAB and SBB into video encoder and decoder to leverage successful channelwise attention and discriminate semantic similarity information for better video captioning. Experiments on two widely used datasets, MSVD and MSR-VTT, demonstrate that our proposed SDN can achieve better performance than state-of-the-art methods.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Structured Encoding Based on Semantic Disambiguation for Video Captioning
    Sun, Bo
    Tian, Jinyu
    Wu, Yong
    Yu, Lunjun
    Tang, Yuanyan
    COGNITIVE COMPUTATION, 2024, 16 (03) : 1032 - 1048
  • [22] Video captioning with stacked attention and semantic hard pull
    Rahman, Md Mushfiqur
    Abedin, Thasin
    Prottoy, Khondokar S. S.
    Moshruba, Ayana
    Siddiqui, Fazlul Hasan
    PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 18
  • [23] Richer Semantic Visual and Language Representation for Video Captioning
    Tang, Pengjie
    Wang, Hanli
    Wang, Hanzhang
    Xu, Kaisheng
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1871 - 1876
  • [24] Semantic Tag Augmented XlanV Model for Video Captioning
    Huang, Yiqing
    Xue, Hongwei
    Chen, Jiansheng
    Ma, Huimin
    Ma, Hongbing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4818 - 4822
  • [25] Audio Captioning with Composition of Acoustic and Semantic Information
    Eren, Aysegul Ozkaya
    Sert, Mustafa
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2021, 15 (02) : 143 - 160
  • [26] Exploiting the local temporal information for video captioning
    Wei, Ran
    Mi, Li
    Hu, Yaosi
    Chen, Zhenzhong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 67 (67)
  • [27] Efficient Video Captioning with Frame Similarity-Based Filtering
    Rashno, Elyas
    Zulkernine, Farhana
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2023, PT II, 2023, 14147 : 98 - 112
  • [28] STSI: Efficiently Mine Spatio-Temporal Semantic Information between Different Multimodal for Video Captioning
    Xiong, Huiyu
    Wang, Lanxiao
    2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [29] Semantic Similarity Based Video Reranking
    Sang, Miaojie
    Sun, Zhonghua
    Jia, Kebin
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 1420 - 1423
  • [30] A Video Captioning Method by Semantic Topic-Guided Generation
    Ye, Ou
    Wei, Xinli
    Yu, Zhenhua
    Fu, Yan
    Yang, Ying
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (01): : 1071 - 1093