Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph

被引：0

作者：

Shixing Han

Jin Liu

Jinyingming Zhang

Peizhu Gong

Xiliang Zhang

Huihua He

机构：

[1] Shanghai Maritime University,College of Information Engineering

[2] Shanghai Normal University,College of Early Childhood Education

来源：

Complex & Intelligent Systems | 2023年 / 9卷

关键词：

Dense video captioning; Cross-modal attention; Commonsense reasoning; Heterogeneous knowledge; Unbiased scene graph;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Dense video captioning (DVC) aims at generating description for each scene in a video. Despite attractive progress for this task, previous works usually only concentrate on exploiting visual features while neglecting audio information in the video, resulting in inaccurate scene event location. In this article, we propose a novel DVC model named CMCR, which is mainly composed of a cross-modal processing (CM) module and a commonsense reasoning (CR) module. CM utilizes a cross-modal attention mechanism to encode data in different modalities. An event refactoring algorithm is proposed to deal with inaccurate event localization caused by overlapping events. Besides, a shared encoder is utilized to reduce model redundancy. CR optimizes the logic of generated captions with both heterogeneous prior knowledge and entities’ association reasoning achieved by building a knowledge-enhanced unbiased scene graph. Extensive experiments are conducted on ActivityNet Captions dataset, the results demonstrate that our model achieves better performance than state-of-the-art methods. To better understand the performance achieved by CMCR, we also apply ablation experiments to analyze the contributions of different modules.

引用

页码：4995 / 5012

页数：17

共 50 条

[21] Online Cross-Modal Scene Retrieval by Binary Representation and Semantic Graph
Qi, Mengshi
Wang, Yunhong
Li, Annan
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 744 - 752
[22] HSGMP: Heterogeneous Scene Graph Message Passing for Cross-modal Retrieval
Duan, Yu
Xiong, Yun
Zhang, Yao
Fu, Yuwei
Zhu, Yangyong
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 82 - 91
[23] Spatial-frequency attention-based optical and scene flow with cross-modal knowledge distillation
Zhou, Youjie
Jiao, Runyu
Tao, Zhonghan
Liang, Xichang
Wan, Yi
VISUAL COMPUTER, 2024, : 4183 - 4198
[24] Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval
Zeng, Yawen
Cao, Da
Wei, Xiaochi
Liu, Meng
Zhao, Zhou
Qin, Zheng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2215 - 2224
[25] Semantic-Enhanced Cross-Modal Fusion for Improved Unsupervised Image Captioning
Xiang, Nan
Chen, Ling
Liang, Leiyan
Rao, Xingdi
Gong, Zehao
ELECTRONICS, 2023, 12 (17)
[26] Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)
Fu, Ze
Feng, Junhao
Zheng, Changmeng
Cai, Yi
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12947 - 12948
[27] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Yuan, Zhihao
Yan, Xu
Liao, Yinghong
Guo, Yao
Li, Guanbin
Cui, Shuguang
Li, Zhen
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8553 - 8563
[28] Similarity-Based Heterogeneous Graph Attention Network for Knowledge-Enhanced Recommendation
Zhang, Fan
Li, Rui
Xu, Ke
Xu, Hongguang
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II, 2021, 12816 : 488 - 499
[29] Cross-Modal Attention Mechanism for Weakly Supervised Video Anomaly Detection
Sun, Wenwen
Cao, Lin
Guo, Yanan
Du, Kangning
BIOMETRIC RECOGNITION, CCBR 2023, 2023, 14463 : 437 - 446
[30] CM-SC: Cross-modal spatial-channel attention network for image captioning
Hossain, Md. Shamim
Aktar, Shamima
Hossain, Mohammad Alamgir
Gu, Naijie
Huang, Zhangjin
DISPLAYS, 2025, 87

← 1 2 3 4 5 →