Overview of Tencent Multi-modal Ads Video Understanding

被引:2
|
作者
Wang, Zhenzhi [1 ]
Li, Zhimin [2 ]
Wu, Liyu [3 ]
Xiong, Jiangfeng [4 ]
Lu, Qinglin [4 ]
机构
[1] Nanjing Univ, Nanjing, Peoples R China
[2] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
[4] Tencent Data Platform, Shenzhen, Peoples R China
关键词
Multi-modal Video Analysis; Temporal Segmentation; Multi-label; Classification;
D O I
10.1145/3474085.3479222
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal Ads Video Understanding Challenge is the first grand challenge aiming to comprehensively understand ads videos. Our challenge includes two tasks: video structuring and multi-label classification. Video structuring asks the participants to accurately predict both the scene boundaries and the multi-label categories of each scene based on a fine-grained and ads-related category hierarchy. This task will advance the foundation of comprehensive ads video understanding, which has a significant impact on many applications in ads, such as video recommendation and user behavior analysis. This paper presents an overview of the video structuring task in our grand challenge, including the background of ads videos, an elaborate description of this task, our proposed dataset, the evaluation protocol, and our baseline model. By ablating the key components of our baseline, we would like to reveal the main challenges of this task and provide useful guidance for future research of this area.
引用
收藏
页码:4725 / 4729
页数:5
相关论文
共 50 条
  • [1] Tencent AVS: A Holistic Ads Video Dataset for Multi-Modal Scene Segmentation
    Jiang, Jie
    Li, Zhimin
    Xiong, Jiangfeng
    Quan, Rongwei
    Lu, Qinglin
    Liu, Wei
    IEEE ACCESS, 2022, 10 : 128959 - 128969
  • [2] Automated Multi-Modal Video Editing for Ads Video
    Lin, Qin
    Pang, Nuo
    Hong, Zhiying
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4823 - 4827
  • [3] Multi-modal fusion for video understanding
    Hoogs, A
    Mundy, J
    Cross, G
    30TH APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP, PROCEEDINGS: ANALYSIS AND UNDERSTANDING OF TIME VARYING IMAGERY, 2001, : 103 - 108
  • [4] A Solution to Multi-modal Ads Video Tagging Challenge
    Wu, Hao
    Wang, Jiajie
    Gu, Yuanzhe
    Zhao, Peisen
    Zu, Zhonglin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4808 - 4812
  • [5] A Multi-modal System for Video Semantic Understanding
    Lv, Zhengwei
    Lei, Tao
    Liang, Xiao
    Shi, Zhizhong
    Liu, Duoxing
    CCKS 2021 - EVALUATION TRACK, 2022, 1553 : 34 - 43
  • [6] MULTI-MODAL REPRESENTATION LEARNING FOR SHORT VIDEO UNDERSTANDING AND RECOMMENDATION
    Guo, Daya
    Hong, Jiangshui
    Luo, Binli
    Yan, Qirui
    Niu, Zhangming
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 687 - 690
  • [7] Deep Video Understanding with a Unified Multi-Modal Retrieval Framework
    Xie, Chen-Wei
    Sun, Siyang
    Zhao, Liming
    Wu, Jianmin
    Li, Dangwei
    Zheng, Yun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7055 - 7059
  • [8] Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
    Zeng, Zhaoyang
    Luo, Yongsheng
    Liu, Zhenhua
    Rao, Fengyun
    Li, Dian
    Guo, Weidong
    Wen, Zhen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3128 - 3137
  • [9] Multi-modal Video Summarization
    Huang, Jia-Hong
    ICMR 2024 - Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024, : 1214 - 1218
  • [10] Multi-modal Video Summarization
    Huang, Jia-Hong
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1214 - 1218