Adaptive semantic guidance network for video captioning☆

被引:0
|
作者
Liu, Yuanyuan [1 ]
Zhu, Hong [1 ]
Wu, Zhong [2 ]
Du, Sen [3 ]
Wu, Shuning [1 ]
Shi, Jing [1 ]
机构
[1] Xian Univ Technol, Sch Automat & Informat Engn, Xian 710048, Shaanxi, Peoples R China
[2] Yuncheng Univ, Shanxi Prov Intelligent Optoelect Sensing Applicat, Yuncheng 044000, Shanxi, Peoples R China
[3] Air Force Engn Univ, Informat & Nav Coll, Xian 710051, Shaanxi, Peoples R China
关键词
Video captioning; Adaptive semantic guidance network; Semantic enhancement encoder; Adaptive control decoder;
D O I
10.1016/j.cviu.2024.104255
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning aims to describe video content using natural language, and effectively integrating information of visual and textual is crucial for generating accurate captions. However, we find that the existing methods over-rely on the language-prior information about the text acquired by training, resulting in the model tending to output high-frequency fixed phrases. In order to solve the above problems, we extract high- quality semantic information from multi-modal input and then build a semantic guidance mechanism to adapt to the contribution of visual semantics and text semantics to generate captions. We propose an Adaptive Semantic Guidance Network (ASGNet) for video captioning. The ASGNet consists of a Semantic Enhancement Encoder (SEE) and an Adaptive Control Decoder (ACD). Specifically, the SEE helps the model obtain high- quality semantic representations by exploring the rich semantic information from visual and textual. The ACD dynamically adjusts the contribution weights of semantics about visual and textual for word generation, guiding the model to adaptively focus on the correct semantic information. These two modules work together to help the model overcome the problem of over-reliance on language priors, resulting in more accurate video captions. Finally, we conducted extensive experiments on commonly used video captioning datasets. MSVD and MSR-VTT reached the state-of-the-art, and YouCookII also achieved good performance. These experiments fully verified the advantages of our method.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Dense semantic embedding network for image captioning
    Xiao, Xinyu
    Wang, Lingfeng
    Ding, Kun
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION, 2019, 90 : 285 - 296
  • [42] A Context Semantic Auxiliary Network for Image Captioning
    Li, Jianying
    Shao, Xiangjun
    INFORMATION, 2023, 14 (07)
  • [43] Graph convolutional network meta-learning with multi-granularity POS guidance for video captioning
    Li, Ping
    Zhang, Pan
    Xu, Xianghua
    NEUROCOMPUTING, 2022, 472 : 294 - 305
  • [44] Adaptive Semantic-Enhanced Transformer for Image Captioning
    Zhang, Jing
    Fang, Zhongjun
    Sun, Han
    Wang, Zhe
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1785 - 1796
  • [45] PosCap: Boosting Video Captioning with Part-of-Speech Guidance
    Xiao, Jingfu
    Chen, Zhiliang
    Jiang, Wenhui
    Fang, Yuming
    Shen, Fei
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT X, 2025, 15040 : 430 - 444
  • [46] A Video Captioning Method by Semantic Topic-Guided Generation
    Ye, Ou
    Wei, Xinli
    Yu, Zhenhua
    Fu, Yan
    Yang, Ying
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (01): : 1071 - 1093
  • [47] Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
    Lei, Zhou
    Huang, Yiyong
    FUTURE INTERNET, 2021, 13 (02) : 1 - 18
  • [48] Video Captioning With Attention-Based LSTM and Semantic Consistency
    Gao, Lianli
    Guo, Zhao
    Zhang, Hanwang
    Xu, Xing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (09) : 2045 - 2055
  • [49] Video captioning algorithm based on mixed training and semantic association
    Chen, Shuqin
    Zhong, Xian
    Huang, Wenxin
    Lu, Yansheng
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2023, 51 (11): : 67 - 74
  • [50] Image Captioning via Semantic Guidance Attention and Consensus Selection Strategy
    Wu, Jie
    Hu, Haifeng
    Wu, Yi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (04)