Adaptive semantic guidance network for video captioning☆

被引：0

作者：

Liu, Yuanyuan ^{[1
]}

Zhu, Hong ^{[1
]}

Wu, Zhong ^{[2
]}

Du, Sen ^{[3
]}

Wu, Shuning ^{[1
]}

Shi, Jing ^{[1
]}

机构：

[1] Xian Univ Technol, Sch Automat & Informat Engn, Xian 710048, Shaanxi, Peoples R China

[2] Yuncheng Univ, Shanxi Prov Intelligent Optoelect Sensing Applicat, Yuncheng 044000, Shanxi, Peoples R China

[3] Air Force Engn Univ, Informat & Nav Coll, Xian 710051, Shaanxi, Peoples R China

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2025年 / 251卷

关键词：

Video captioning; Adaptive semantic guidance network; Semantic enhancement encoder; Adaptive control decoder;

D O I：

10.1016/j.cviu.2024.104255

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video captioning aims to describe video content using natural language, and effectively integrating information of visual and textual is crucial for generating accurate captions. However, we find that the existing methods over-rely on the language-prior information about the text acquired by training, resulting in the model tending to output high-frequency fixed phrases. In order to solve the above problems, we extract high- quality semantic information from multi-modal input and then build a semantic guidance mechanism to adapt to the contribution of visual semantics and text semantics to generate captions. We propose an Adaptive Semantic Guidance Network (ASGNet) for video captioning. The ASGNet consists of a Semantic Enhancement Encoder (SEE) and an Adaptive Control Decoder (ACD). Specifically, the SEE helps the model obtain high- quality semantic representations by exploring the rich semantic information from visual and textual. The ACD dynamically adjusts the contribution weights of semantics about visual and textual for word generation, guiding the model to adaptively focus on the correct semantic information. These two modules work together to help the model overcome the problem of over-reliance on language priors, resulting in more accurate video captions. Finally, we conducted extensive experiments on commonly used video captioning datasets. MSVD and MSR-VTT reached the state-of-the-art, and YouCookII also achieved good performance. These experiments fully verified the advantages of our method.

引用

页数：13

共 50 条

[41] Dense semantic embedding network for image captioning
Xiao, Xinyu
Wang, Lingfeng
Ding, Kun
Xiang, Shiming
Pan, Chunhong
PATTERN RECOGNITION, 2019, 90 : 285 - 296
[42] A Context Semantic Auxiliary Network for Image Captioning
Li, Jianying
Shao, Xiangjun
INFORMATION, 2023, 14 (07)
[43] Graph convolutional network meta-learning with multi-granularity POS guidance for video captioning
Li, Ping
Zhang, Pan
Xu, Xianghua
NEUROCOMPUTING, 2022, 472 : 294 - 305
[44] Adaptive Semantic-Enhanced Transformer for Image Captioning
Zhang, Jing
Fang, Zhongjun
Sun, Han
Wang, Zhe
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1785 - 1796
[45] PosCap: Boosting Video Captioning with Part-of-Speech Guidance
Xiao, Jingfu
Chen, Zhiliang
Jiang, Wenhui
Fang, Yuming
Shen, Fei
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT X, 2025, 15040 : 430 - 444
[46] A Video Captioning Method by Semantic Topic-Guided Generation
Ye, Ou
Wei, Xinli
Yu, Zhenhua
Fu, Yan
Yang, Ying
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (01): : 1071 - 1093
[47] Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
Lei, Zhou
Huang, Yiyong
FUTURE INTERNET, 2021, 13 (02) : 1 - 18
[48] Video Captioning With Attention-Based LSTM and Semantic Consistency
Gao, Lianli
Guo, Zhao
Zhang, Hanwang
Xu, Xing
Shen, Heng Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (09) : 2045 - 2055
[49] Video captioning algorithm based on mixed training and semantic association
Chen, Shuqin
Zhong, Xian
Huang, Wenxin
Lu, Yansheng
Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2023, 51 (11): : 67 - 74
[50] Image Captioning via Semantic Guidance Attention and Consensus Selection Strategy
Wu, Jie
Hu, Haifeng
Wu, Yi
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (04)

← 1 2 3 4 5 →