A Multi-modal System for Video Semantic Understanding

被引:0
|
作者
Lv, Zhengwei [1 ]
Lei, Tao [1 ]
Liang, Xiao [1 ]
Shi, Zhizhong [1 ]
Liu, Duoxing [1 ]
机构
[1] Autohome Inc, Beijing, Peoples R China
来源
CCKS 2021 - EVALUATION TRACK | 2022年 / 1553卷
关键词
Multi-modal representation; Semantic understanding; Video;
D O I
10.1007/978-981-19-0713-5_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a video semantic understanding system based on multi-modal data fusion. The system includes two sub-models, the video classification tag model (VCT) and the video semantic tagmodel (VST), to generate classification tags and semantic tags for videos respectively. The VCT model uses bidirectional LSTM model and Attention mechanism to integrate the video features, which can effectively improve the model result than other methods. The VST model directly extracts semantic tags from text data with the combined model of ROBERTA and CRF. We implemented the system in the CCKS 2021 Task 14 and achieved an F1 score of 0.5054, ranking second among 187 teams.
引用
收藏
页码:34 / 43
页数:10
相关论文
共 50 条
  • [41] Multi-modal tag localization for mobile video search
    Rui Zhang
    Sheng Tang
    Wu Liu
    Yongdong Zhang
    Jintao Li
    Multimedia Systems, 2017, 23 : 713 - 724
  • [42] Personalized Multi-modal Video Retrieval on Mobile Devices
    Zhang, Haotian
    Jepson, Allan D.
    Mohomed, Iqbal
    Derpanis, Konstantinos G.
    Zhang, Ran
    Fazly, Afsaneh
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1185 - 1191
  • [43] Multi-modal Interactive Video Retrieval with Temporal Queries
    Heller, Silvan
    Arnold, Rahel
    Gasser, Ralph
    Gsteiger, Viktor
    Parian-Scherb, Mahnaz
    Rossetto, Luca
    Sauter, Loris
    Spiess, Florian
    Schuldt, Heiko
    MULTIMEDIA MODELING, MMM 2022, PT II, 2022, 13142 : 493 - 498
  • [44] A Multi-Modal Approach to Story Segmentation for News Video
    Lekha Chaisorn
    Tat-Seng Chua
    Chin-Hui Lee
    World Wide Web, 2003, 6 : 187 - 208
  • [45] A Solution to Multi-modal Ads Video Tagging Challenge
    Wu, Hao
    Wang, Jiajie
    Gu, Yuanzhe
    Zhao, Peisen
    Zu, Zhonglin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4808 - 4812
  • [46] Video Pivoting Unsupervised Multi-Modal Machine Translation
    Li, Mingjie
    Huang, Po-Yao
    Chang, Xiaojun
    Hu, Junjie
    Yang, Yi
    Hauptmann, Alex
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3918 - 3932
  • [47] Multi-Modal Emotion Recognition Fusing Video and Audio
    Xu, Chao
    Du, Pufeng
    Feng, Zhiyong
    Meng, Zhaopeng
    Cao, Tianyi
    Dong, Caichao
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
  • [48] VTLayout: A Multi-Modal Approach for Video Text Layout
    Zhao, Yuxuan
    Ma, Jin
    Qi, Zhongang
    Xie, Zehua
    Luo, Yu
    Kang, Qiusheng
    Shan, Ying
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2775 - 2784
  • [49] Multi-modal tag localization for mobile video search
    Zhang, Rui
    Tang, Sheng
    Liu, Wu
    Zhang, Yongdong
    Li, Jintao
    MULTIMEDIA SYSTEMS, 2017, 23 (06) : 713 - 724
  • [50] Hierarchical multi-modal video summarization with dynamic sampling
    Yu, Lingjian
    Zhao, Xing
    Xie, Liang
    Liang, Haoran
    Liang, Ronghua
    IET IMAGE PROCESSING, 2024, 18 (14) : 4577 - 4588