A Multi-modal System for Video Semantic Understanding

被引:0
|
作者
Lv, Zhengwei [1 ]
Lei, Tao [1 ]
Liang, Xiao [1 ]
Shi, Zhizhong [1 ]
Liu, Duoxing [1 ]
机构
[1] Autohome Inc, Beijing, Peoples R China
来源
CCKS 2021 - EVALUATION TRACK | 2022年 / 1553卷
关键词
Multi-modal representation; Semantic understanding; Video;
D O I
10.1007/978-981-19-0713-5_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a video semantic understanding system based on multi-modal data fusion. The system includes two sub-models, the video classification tag model (VCT) and the video semantic tagmodel (VST), to generate classification tags and semantic tags for videos respectively. The VCT model uses bidirectional LSTM model and Attention mechanism to integrate the video features, which can effectively improve the model result than other methods. The VST model directly extracts semantic tags from text data with the combined model of ROBERTA and CRF. We implemented the system in the CCKS 2021 Task 14 and achieved an F1 score of 0.5054, ranking second among 187 teams.
引用
收藏
页码:34 / 43
页数:10
相关论文
共 50 条
  • [21] Multi-Modal Multi-Action Video Recognition
    Shi, Zhensheng
    Liang, Ju
    Li, Qianqian
    Zheng, Haiyong
    Gu, Zhaorui
    Dong, Junyu
    Zheng, Bing
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13658 - 13667
  • [22] Multi-modal features and correlation incorporated Naive Bayes classifier for a semantic-enriched lecture video retrieval system
    Poornima, N.
    Saleena, B.
    IMAGING SCIENCE JOURNAL, 2018, 66 (05): : 263 - 277
  • [23] Multi-modal humor segment prediction in video
    Yang, Zekun
    Nakashima, Yuta
    Takemura, Haruo
    MULTIMEDIA SYSTEMS, 2023, 29 (04) : 2389 - 2398
  • [24] Hierarchically multi-modal indexing of soccer video
    Liu, Yuchi
    Wu, Lingda
    Lei, Zhen
    Xie, Yuxiang
    12TH INTERNATIONAL MULTI-MEDIA MODELLING CONFERENCE PROCEEDINGS, 2006, : 393 - 396
  • [25] Multi-modal Dependency Tree for Video Captioning
    Zhao, Wentian
    Wu, Xinxiao
    Luo, Jiebo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [26] Multi-modal Laughter Recognition in Video Conversations
    Escalera, Sergio
    Puertas, Eloi
    Radeva, Petia
    Pujol, Oriol
    2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 869 - 874
  • [27] Multi-modal tracking of faces for video communications
    Crowley, JL
    Berard, F
    1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, : 640 - 645
  • [28] Multi-modal humor segment prediction in video
    Zekun Yang
    Yuta Nakashima
    Haruo Takemura
    Multimedia Systems, 2023, 29 : 2389 - 2398
  • [29] The Multi-Modal Video Reasoning and Analyzing Competition
    Peng, Haoran
    Huang, He
    Xu, Li
    Li, Tianjiao
    Liu, Jun
    Rahmani, Hossein
    Ke, Qiuhong
    Guo, Zhicheng
    Wu, Cong
    Li, Rongchang
    Ye, Mang
    Wang, Jiahao
    Zhang, Jiaxu
    Liu, Yuanzhong
    He, Tao
    Zhang, Fuwei
    Liu, Xianbin
    Lin, Tao
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 806 - 813
  • [30] Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition
    Guo, Jie
    Nie, Xiushan
    Yin, Yilong
    IEEE ACCESS, 2020, 8 : 29518 - 29524