A Multi-modal System for Video Semantic Understanding

被引：0

作者：

Lv, Zhengwei ^{[1
]}

Lei, Tao ^{[1
]}

Liang, Xiao ^{[1
]}

Shi, Zhizhong ^{[1
]}

Liu, Duoxing ^{[1
]}

机构：

[1] Autohome Inc, Beijing, Peoples R China

来源：

CCKS 2021 - EVALUATION TRACK | 2022年 / 1553卷

关键词：

Multi-modal representation; Semantic understanding; Video;

D O I：

10.1007/978-981-19-0713-5_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a video semantic understanding system based on multi-modal data fusion. The system includes two sub-models, the video classification tag model (VCT) and the video semantic tagmodel (VST), to generate classification tags and semantic tags for videos respectively. The VCT model uses bidirectional LSTM model and Attention mechanism to integrate the video features, which can effectively improve the model result than other methods. The VST model directly extracts semantic tags from text data with the combined model of ROBERTA and CRF. We implemented the system in the CCKS 2021 Task 14 and achieved an F1 score of 0.5054, ranking second among 187 teams.

引用

页码：34 / 43

页数：10

共 50 条

[31] BOOSTING MULTI-MODAL CAMERA SELECTION WITH SEMANTIC FEATURES
Hoernler, Benedikt
Arsic, Dejan
Schuller, Bjoen
Rigoll, Gerhard
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1298 - 1301
[32] Multi-modal information retrieval with a semantic view mechanism
Li, Q
Yang, J
Zhuang, YT
19TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, PROCEEDINGS: AINA 2005, 2005, : 133 - 138
[33] Semantic Alignment Network for Multi-Modal Emotion Recognition
Hou, Mixiao
Zhang, Zheng
Liu, Chang
Lu, Guangming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329
[34] Pseudo Multi-Modal Approach to LiDAR Semantic Segmentation
Kim, Kyungmin
SENSORS, 2024, 24 (23)
[35] MULTI-MODAL SEMANTIC MESH SEGMENTATION IN URBAN SCENES
Laupheimer, Dominik
Haala, Norbert
XXIV ISPRS CONGRESS IMAGING TODAY, FORESEEING TOMORROW, COMMISSION II, 2022, 5-2 : 267 - 274
[36] Hashing-based Multi-modal Semantic Communication
Zhu, Yibo
Gu, Hongyu
Nie, Jiangtian
Tang, Jianhang
Jin, Jiangming
Zhang, Yang
2024 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC 2024, 2024,
[37] An approach to multi-modal multi-view video coding
Zhang, Yun
Jiang, Gangyi
Yi, Wenjuan
Yu, Mei
Jiang, Zhidi
Kim, Yong Deak
2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1405 - +
[38] An Intelligent Advertisement Short Video Production System via Multi-Modal Retrieval
Wei, Yanheng
Huang, Lianghua
Zhang, Yanhao
Zheng, Yun
Pan, Pan
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3368 - 3372
[39] A multi-modal approach to story segmentation for news video
Chaisorn, L
Chua, TS
Lee, CH
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2003, 6 (02): : 187 - 208
[40] Multi-modal Language Models for Lecture Video Retrieval
Chen, Huizhong
Cooper, Matthew
Joshi, Dhiraj
Girod, Bernd
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 1081 - 1084

← 1 2 3 4 5 →