Semantic similarity information discrimination for video captioning

被引:3
|
作者
Du, Sen [1 ]
Zhu, Hong [1 ]
Xiong, Ge [1 ]
Lin, Guangfeng [2 ]
Wang, Dong [1 ]
Shi, Jing [1 ]
Wang, Jing [2 ]
Xing, Nan [1 ]
机构
[1] Xian Univ Technol, Sch Automation & Informat Engn, 5 South Jinhua Rd, Xian 710048, Shaanxi, Peoples R China
[2] Xian Univ Technol, Informat Sci Dept, 5 South Jinhua Rd, Xian 710048, Shaanxi, Peoples R China
关键词
Video captioning; Semantic detection; Bilinear pooling; Channel attention; Natural language processing; NETWORK;
D O I
10.1016/j.eswa.2022.118985
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning is a task that aims to automatically describe objects and their actions in videos using natural language sentences. The correct understanding of vision and language information is critical for video captioning tasks. Many existing methods usually fuse different features to generate sentences. However, the sentences have many improper nouns and verbs. Inspired by the successes of fine-grained visual recognition, we treat the problem of improper words to discriminate semantic similarity information. In this paper, we designed a semantic bilinear block (SBB) to widen the gap between the probability of existing and nonexistent words, which can capture more fine-grained features to discriminate semantic information. Moreover, our designed linear attention block (LAB) implements the channelwise attention for the 1-D feature by simplifying the squeeze-and-excitation structure. Furthermore, we designed a semantic discrimination network (SDN) that integrates the LAB and SBB into video encoder and decoder to leverage successful channelwise attention and discriminate semantic similarity information for better video captioning. Experiments on two widely used datasets, MSVD and MSR-VTT, demonstrate that our proposed SDN can achieve better performance than state-of-the-art methods.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Video Captioning with Semantic Information from the Knowledge Base
    Wang, Dan
    Song, Dandan
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (IEEE ICBK 2017), 2017, : 224 - 229
  • [2] Dense video captioning using unsupervised semantic information
    Estevam, Valter
    Laroca, Rayson
    Pedrini, Helio
    Menotti, David
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 107
  • [3] Video Captioning with Semantic Guiding
    Yuan, Jin
    Tian, Chunna
    Zhang, Xiangnan
    Ding, Yuxuan
    Wei, Wei
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [4] Semantic Grouping Network for Video Captioning
    Ryu, Hobin
    Kang, Sunghun
    Kang, Haeyong
    Yoo, Chang D.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2514 - 2522
  • [5] Video Captioning with Transferred Semantic Attributes
    Pan, Yingwei
    Yao, Ting
    Li, Houqiang
    Mei, Tao
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 984 - 992
  • [6] Semantic guidance network for video captioning
    Guo, Lan
    Zhao, Hong
    Chen, Zhiwen
    Han, Zeyu
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [7] Video Captioning with Visual and Semantic Features
    Lee, Sujin
    Kim, Incheol
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (06): : 1318 - 1330
  • [8] Semantic guidance network for video captioning
    Lan Guo
    Hong Zhao
    ZhiWen Chen
    ZeYu Han
    Scientific Reports, 13
  • [9] Incorporating Textual Similarity in Video Captioning Schemes
    Gkountakos, Konstantinos
    Dimou, Anastasios
    Papadopoulos, Georgios Th.
    Daras, Petros
    2019 IEEE INTERNATIONAL CONFERENCE ON ENGINEERING, TECHNOLOGY AND INNOVATION (ICE/ITMC), 2019,
  • [10] Improving distinctiveness in video captioning with text-video similarity
    Velda, Vania
    Immanuel, Steve Andreas
    Hendria, Willy Fitra
    Jeong, Cheol
    IMAGE AND VISION COMPUTING, 2023, 136