A Multi-modal System for Video Semantic Understanding

被引：0

作者：

Lv, Zhengwei ^{[1
]}

Lei, Tao ^{[1
]}

Liang, Xiao ^{[1
]}

Shi, Zhizhong ^{[1
]}

Liu, Duoxing ^{[1
]}

机构：

[1] Autohome Inc, Beijing, Peoples R China

来源：

CCKS 2021 - EVALUATION TRACK | 2022年 / 1553卷

关键词：

Multi-modal representation; Semantic understanding; Video;

D O I：

10.1007/978-981-19-0713-5_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a video semantic understanding system based on multi-modal data fusion. The system includes two sub-models, the video classification tag model (VCT) and the video semantic tagmodel (VST), to generate classification tags and semantic tags for videos respectively. The VCT model uses bidirectional LSTM model and Attention mechanism to integrate the video features, which can effectively improve the model result than other methods. The VST model directly extracts semantic tags from text data with the combined model of ROBERTA and CRF. We implemented the system in the CCKS 2021 Task 14 and achieved an F1 score of 0.5054, ranking second among 187 teams.

引用

页码：34 / 43

页数：10

共 50 条

[41] Multi-modal tag localization for mobile video search
Rui Zhang
Sheng Tang
Wu Liu
Yongdong Zhang
Jintao Li
Multimedia Systems, 2017, 23 : 713 - 724
[42] Personalized Multi-modal Video Retrieval on Mobile Devices
Zhang, Haotian
Jepson, Allan D.
Mohomed, Iqbal
Derpanis, Konstantinos G.
Zhang, Ran
Fazly, Afsaneh
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1185 - 1191
[43] Multi-modal Interactive Video Retrieval with Temporal Queries
Heller, Silvan
Arnold, Rahel
Gasser, Ralph
Gsteiger, Viktor
Parian-Scherb, Mahnaz
Rossetto, Luca
Sauter, Loris
Spiess, Florian
Schuldt, Heiko
MULTIMEDIA MODELING, MMM 2022, PT II, 2022, 13142 : 493 - 498
[44] A Multi-Modal Approach to Story Segmentation for News Video
Lekha Chaisorn
Tat-Seng Chua
Chin-Hui Lee
World Wide Web, 2003, 6 : 187 - 208
[45] A Solution to Multi-modal Ads Video Tagging Challenge
Wu, Hao
Wang, Jiajie
Gu, Yuanzhe
Zhao, Peisen
Zu, Zhonglin
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4808 - 4812
[46] Video Pivoting Unsupervised Multi-Modal Machine Translation
Li, Mingjie
Huang, Po-Yao
Chang, Xiaojun
Hu, Junjie
Yang, Yi
Hauptmann, Alex
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3918 - 3932
[47] Multi-Modal Emotion Recognition Fusing Video and Audio
Xu, Chao
Du, Pufeng
Feng, Zhiyong
Meng, Zhaopeng
Cao, Tianyi
Dong, Caichao
APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
[48] VTLayout: A Multi-Modal Approach for Video Text Layout
Zhao, Yuxuan
Ma, Jin
Qi, Zhongang
Xie, Zehua
Luo, Yu
Kang, Qiusheng
Shan, Ying
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2775 - 2784
[49] Multi-modal tag localization for mobile video search
Zhang, Rui
Tang, Sheng
Liu, Wu
Zhang, Yongdong
Li, Jintao
MULTIMEDIA SYSTEMS, 2017, 23 (06) : 713 - 724
[50] Hierarchical multi-modal video summarization with dynamic sampling
Yu, Lingjian
Zhao, Xing
Xie, Liang
Liang, Haoran
Liang, Ronghua
IET IMAGE PROCESSING, 2024, 18 (14) : 4577 - 4588

← 1 2 3 4 5 →