A Multi-modal System for Video Semantic Understanding

被引:0
|
作者
Lv, Zhengwei [1 ]
Lei, Tao [1 ]
Liang, Xiao [1 ]
Shi, Zhizhong [1 ]
Liu, Duoxing [1 ]
机构
[1] Autohome Inc, Beijing, Peoples R China
来源
CCKS 2021 - EVALUATION TRACK | 2022年 / 1553卷
关键词
Multi-modal representation; Semantic understanding; Video;
D O I
10.1007/978-981-19-0713-5_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a video semantic understanding system based on multi-modal data fusion. The system includes two sub-models, the video classification tag model (VCT) and the video semantic tagmodel (VST), to generate classification tags and semantic tags for videos respectively. The VCT model uses bidirectional LSTM model and Attention mechanism to integrate the video features, which can effectively improve the model result than other methods. The VST model directly extracts semantic tags from text data with the combined model of ROBERTA and CRF. We implemented the system in the CCKS 2021 Task 14 and achieved an F1 score of 0.5054, ranking second among 187 teams.
引用
收藏
页码:34 / 43
页数:10
相关论文
共 50 条
  • [31] BOOSTING MULTI-MODAL CAMERA SELECTION WITH SEMANTIC FEATURES
    Hoernler, Benedikt
    Arsic, Dejan
    Schuller, Bjoen
    Rigoll, Gerhard
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1298 - 1301
  • [32] Multi-modal information retrieval with a semantic view mechanism
    Li, Q
    Yang, J
    Zhuang, YT
    19TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, PROCEEDINGS: AINA 2005, 2005, : 133 - 138
  • [33] Semantic Alignment Network for Multi-Modal Emotion Recognition
    Hou, Mixiao
    Zhang, Zheng
    Liu, Chang
    Lu, Guangming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329
  • [34] Pseudo Multi-Modal Approach to LiDAR Semantic Segmentation
    Kim, Kyungmin
    SENSORS, 2024, 24 (23)
  • [35] MULTI-MODAL SEMANTIC MESH SEGMENTATION IN URBAN SCENES
    Laupheimer, Dominik
    Haala, Norbert
    XXIV ISPRS CONGRESS IMAGING TODAY, FORESEEING TOMORROW, COMMISSION II, 2022, 5-2 : 267 - 274
  • [36] Hashing-based Multi-modal Semantic Communication
    Zhu, Yibo
    Gu, Hongyu
    Nie, Jiangtian
    Tang, Jianhang
    Jin, Jiangming
    Zhang, Yang
    2024 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC 2024, 2024,
  • [37] An approach to multi-modal multi-view video coding
    Zhang, Yun
    Jiang, Gangyi
    Yi, Wenjuan
    Yu, Mei
    Jiang, Zhidi
    Kim, Yong Deak
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1405 - +
  • [38] An Intelligent Advertisement Short Video Production System via Multi-Modal Retrieval
    Wei, Yanheng
    Huang, Lianghua
    Zhang, Yanhao
    Zheng, Yun
    Pan, Pan
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3368 - 3372
  • [39] A multi-modal approach to story segmentation for news video
    Chaisorn, L
    Chua, TS
    Lee, CH
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2003, 6 (02): : 187 - 208
  • [40] Multi-modal Language Models for Lecture Video Retrieval
    Chen, Huizhong
    Cooper, Matthew
    Joshi, Dhiraj
    Girod, Bernd
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 1081 - 1084