A Multi-modal System for Video Semantic Understanding

被引：0

作者：

Lv, Zhengwei ^{[1
]}

Lei, Tao ^{[1
]}

Liang, Xiao ^{[1
]}

Shi, Zhizhong ^{[1
]}

Liu, Duoxing ^{[1
]}

机构：

[1] Autohome Inc, Beijing, Peoples R China

来源：

CCKS 2021 - EVALUATION TRACK | 2022年 / 1553卷

关键词：

Multi-modal representation; Semantic understanding; Video;

D O I：

10.1007/978-981-19-0713-5_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a video semantic understanding system based on multi-modal data fusion. The system includes two sub-models, the video classification tag model (VCT) and the video semantic tagmodel (VST), to generate classification tags and semantic tags for videos respectively. The VCT model uses bidirectional LSTM model and Attention mechanism to integrate the video features, which can effectively improve the model result than other methods. The VST model directly extracts semantic tags from text data with the combined model of ROBERTA and CRF. We implemented the system in the CCKS 2021 Task 14 and achieved an F1 score of 0.5054, ranking second among 187 teams.

引用

页码：34 / 43

页数：10

共 50 条

[21] Multi-Modal Multi-Action Video Recognition
Shi, Zhensheng
Liang, Ju
Li, Qianqian
Zheng, Haiyong
Gu, Zhaorui
Dong, Junyu
Zheng, Bing
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13658 - 13667
[22] Multi-modal features and correlation incorporated Naive Bayes classifier for a semantic-enriched lecture video retrieval system
Poornima, N.
Saleena, B.
IMAGING SCIENCE JOURNAL, 2018, 66 (05): : 263 - 277
[23] Multi-modal humor segment prediction in video
Yang, Zekun
Nakashima, Yuta
Takemura, Haruo
MULTIMEDIA SYSTEMS, 2023, 29 (04) : 2389 - 2398
[24] Hierarchically multi-modal indexing of soccer video
Liu, Yuchi
Wu, Lingda
Lei, Zhen
Xie, Yuxiang
12TH INTERNATIONAL MULTI-MEDIA MODELLING CONFERENCE PROCEEDINGS, 2006, : 393 - 396
[25] Multi-modal Dependency Tree for Video Captioning
Zhao, Wentian
Wu, Xinxiao
Luo, Jiebo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[26] Multi-modal Laughter Recognition in Video Conversations
Escalera, Sergio
Puertas, Eloi
Radeva, Petia
Pujol, Oriol
2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 869 - 874
[27] Multi-modal tracking of faces for video communications
Crowley, JL
Berard, F
1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, : 640 - 645
[28] Multi-modal humor segment prediction in video
Zekun Yang
Yuta Nakashima
Haruo Takemura
Multimedia Systems, 2023, 29 : 2389 - 2398
[29] The Multi-Modal Video Reasoning and Analyzing Competition
Peng, Haoran
Huang, He
Xu, Li
Li, Tianjiao
Liu, Jun
Rahmani, Hossein
Ke, Qiuhong
Guo, Zhicheng
Wu, Cong
Li, Rongchang
Ye, Mang
Wang, Jiahao
Zhang, Jiaxu
Liu, Yuanzhong
He, Tao
Zhang, Fuwei
Liu, Xianbin
Lin, Tao
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 806 - 813
[30] Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition
Guo, Jie
Nie, Xiushan
Yin, Yilong
IEEE ACCESS, 2020, 8 : 29518 - 29524

← 1 2 3 4 5 →