Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph

被引：0

作者：

Shixing Han

Jin Liu

Jinyingming Zhang

Peizhu Gong

Xiliang Zhang

Huihua He

机构：

[1] Shanghai Maritime University,College of Information Engineering

[2] Shanghai Normal University,College of Early Childhood Education

来源：

Complex & Intelligent Systems | 2023年 / 9卷

关键词：

Dense video captioning; Cross-modal attention; Commonsense reasoning; Heterogeneous knowledge; Unbiased scene graph;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Dense video captioning (DVC) aims at generating description for each scene in a video. Despite attractive progress for this task, previous works usually only concentrate on exploiting visual features while neglecting audio information in the video, resulting in inaccurate scene event location. In this article, we propose a novel DVC model named CMCR, which is mainly composed of a cross-modal processing (CM) module and a commonsense reasoning (CR) module. CM utilizes a cross-modal attention mechanism to encode data in different modalities. An event refactoring algorithm is proposed to deal with inaccurate event localization caused by overlapping events. Besides, a shared encoder is utilized to reduce model redundancy. CR optimizes the logic of generated captions with both heterogeneous prior knowledge and entities’ association reasoning achieved by building a knowledge-enhanced unbiased scene graph. Extensive experiments are conducted on ActivityNet Captions dataset, the results demonstrate that our model achieves better performance than state-of-the-art methods. To better understand the performance achieved by CMCR, we also apply ablation experiments to analyze the contributions of different modules.

引用

页码：4995 / 5012

页数：17

共 50 条

[41] Enhanced Cross-Modal Transformer Model for Video Semantic Similarity Measurement
Li, Da
Zhu, Boqing
Xu, Kele
Yang, Sen
Feng, Dawei
Liu, Bo
Wang, Huaimin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (01) : 475 - 479
[42] Leveraging Weighted Fine-Grained Cross-Graph Attention for Visual and Semantic Enhanced Video Captioning Network
Verma, Deepali
Haldar, Arya
Dutta, Tanima
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2465 - 2473
[43] A cross-modal conditional mechanism based on attention for text-video retrieval
Du, Wanru
Jing, Xiaochuan
Zhu, Quan
Wang, Xiaoyin
Liu, Xuan
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (11) : 20073 - 20092
[44] Cross-Modal Semantic Fusion Video Emotion Analysis Based on Attention Mechanism
Zhao, Lianfen
Pan, Zhengjun
2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 381 - 386
[45] Lightweight Cross-Modal Multispectral Pedestrian Detection Based on Spatial Reweighted Attention Mechanism
Deng, Lujuan
Fu, Ruochong
Li, Zuhe
Liu, Boyi
Xue, Mengze
Cui, Yuhao
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (03): : 4071 - 4089
[46] Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention
Jiang, Bin
Huang, Xin
Yang, Chao
Yuan, Junsong
ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 217 - 225
[47] From Sparse to Dense: Semantic Graph Evolutionary Hashing for Unsupervised Cross-Modal Retrieval
Zhao, Yang
Yu, Jiaguo
Liao, Shengbin
Zhang, Zheng
Zhang, Haofeng
COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 521 - 536
[48] Cross-Modal Graph Knowledge Representation and Distillation Learning for Land Cover Classification
Wang, Wenzhen
Liu, Fang
Liao, Wenzhi
Xiao, Liang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[49] Cross-modal Knowledge Graph Contrastive Learning for Machine Learning Method Recommendation
Cao, Xianshuai
Shi, Yuliang
Wang, Jihu
Yu, Han
Wang, Xinjun
Yan, Zhongmin
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3694 - 3702
[50] Knowledge graph embedding by fusing multimodal content via cross-modal learning
Liu, Shi
Li, Kaiyang
Wang, Yaoying
Zhu, Tianyou
Li, Jiwei
Chen, Zhenyu
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14180 - 14200

← 1 2 3 4 5 →