A Multi-modal System for Video Semantic Understanding

被引:0
|
作者
Lv, Zhengwei [1 ]
Lei, Tao [1 ]
Liang, Xiao [1 ]
Shi, Zhizhong [1 ]
Liu, Duoxing [1 ]
机构
[1] Autohome Inc, Beijing, Peoples R China
来源
CCKS 2021 - EVALUATION TRACK | 2022年 / 1553卷
关键词
Multi-modal representation; Semantic understanding; Video;
D O I
10.1007/978-981-19-0713-5_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a video semantic understanding system based on multi-modal data fusion. The system includes two sub-models, the video classification tag model (VCT) and the video semantic tagmodel (VST), to generate classification tags and semantic tags for videos respectively. The VCT model uses bidirectional LSTM model and Attention mechanism to integrate the video features, which can effectively improve the model result than other methods. The VST model directly extracts semantic tags from text data with the combined model of ROBERTA and CRF. We implemented the system in the CCKS 2021 Task 14 and achieved an F1 score of 0.5054, ranking second among 187 teams.
引用
收藏
页码:34 / 43
页数:10
相关论文
共 50 条
  • [1] A multi-modal system for the retrieval of semantic video events
    Amir, A
    Basu, S
    Iyengar, G
    Lin, CY
    Naphade, M
    Smith, JR
    Srinivasan, S
    Tseng, B
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2004, 96 (02) : 216 - 236
  • [2] Multi-modal fusion for video understanding
    Hoogs, A
    Mundy, J
    Cross, G
    30TH APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP, PROCEEDINGS: ANALYSIS AND UNDERSTANDING OF TIME VARYING IMAGERY, 2001, : 103 - 108
  • [3] Overview of Tencent Multi-modal Ads Video Understanding
    Wang, Zhenzhi
    Li, Zhimin
    Wu, Liyu
    Xiong, Jiangfeng
    Lu, Qinglin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4725 - 4729
  • [4] Generative Multi-Modal Mutual Enhancement Video Semantic Communications
    Chen, Yuanle
    Wang, Haobo
    Liu, Chunyu
    Wang, Linyi
    Liu, Jiaxin
    Wu, Wei
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, 139 (03): : 2985 - 3009
  • [5] Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
    Wu, Zichen
    Huang, HsiuYuan
    Qu, Fanyi
    Wu, Yunfang
    arXiv,
  • [6] MULTI-MODAL REPRESENTATION LEARNING FOR SHORT VIDEO UNDERSTANDING AND RECOMMENDATION
    Guo, Daya
    Hong, Jiangshui
    Luo, Binli
    Yan, Qirui
    Niu, Zhangming
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 687 - 690
  • [7] Deep Video Understanding with a Unified Multi-Modal Retrieval Framework
    Xie, Chen-Wei
    Sun, Siyang
    Zhao, Liming
    Wu, Jianmin
    Li, Dangwei
    Zheng, Yun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7055 - 7059
  • [8] Towards Developing a Multi-Modal Video Recommendation System
    Pingali, Sriram
    Mondal, Prabir
    Chakder, Daipayan
    Saha, Sriparna
    Ghosh, Angshuman
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [9] How to Read Paintings: Semantic Art Understanding with Multi-modal Retrieval
    Garcia, Noa
    Vogiatzis, George
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT II, 2019, 11130 : 676 - 691
  • [10] Multi-modal Video Summarization
    Huang, Jia-Hong
    ICMR 2024 - Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024, : 1214 - 1218