Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning

被引:9
|
作者
Yan, Yichao [1 ]
Zhuang, Ning [1 ]
Ni, Bingbing [1 ]
Zhang, Jian [1 ]
Xu, Minghao [1 ]
Zhang, Qiang [1 ]
Zheng, Zhang [1 ]
Cheng, Shuo [1 ]
Tian, Qi [3 ]
Xu, Yi [1 ]
Yang, Xiaokang [2 ]
Zhang, Wenjun [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China
[3] Univ Texas San Antonio, San Antonio, TX 78249 USA
基金
中国国家自然科学基金;
关键词
Video caption; representation learning; graphCNN; fine-grained; multiple granularity; SEGMENTATION;
D O I
10.1109/TPAMI.2019.2946823
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6 K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.
引用
收藏
页码:666 / 683
页数:18
相关论文
共 50 条
  • [1] Graph Neural Networks Based Multi-granularity Feature Representation Learning for Fine-Grained Visual Categorization
    Wu, Hongyan
    Guo, Haiyun
    Miao, Qinghai
    Huang, Min
    Wang, Jinqiao
    MULTIMEDIA MODELING, MMM 2022, PT II, 2022, 13142 : 230 - 242
  • [2] Fine-grained image recognition via trusted multi-granularity information fusion
    Yu, Ying
    Tang, Hong
    Qian, Jin
    Zhu, Zhiliang
    Cai, Zhen
    Lv, Jingqin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (04) : 1105 - 1117
  • [3] Fine-grained image recognition via trusted multi-granularity information fusion
    Ying Yu
    Hong Tang
    Jin Qian
    Zhiliang Zhu
    Zhen Cai
    Jingqin Lv
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 1105 - 1117
  • [4] Multi-Granularity Feature Distillation Learning Network for Fine-Grained Visual Classification
    Cai, Yuhang
    Ke, Xiao
    2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 300 - 303
  • [5] Efficient multi-granularity network for fine-grained image classification
    Jiabao Wang
    Yang Li
    Hang Li
    Xun Zhao
    Rui Zhang
    Zhuang Miao
    Journal of Real-Time Image Processing, 2022, 19 : 853 - 866
  • [6] Efficient multi-granularity network for fine-grained image classification
    Wang, Jiabao
    Li, Yang
    Li, Hang
    Zhao, Xun
    Zhang, Rui
    Miao, Zhuang
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2022, 19 (05) : 853 - 866
  • [7] Graph convolutional network meta-learning with multi-granularity POS guidance for video captioning
    Li, Ping
    Zhang, Pan
    Xu, Xianghua
    NEUROCOMPUTING, 2022, 472 : 294 - 305
  • [8] Multi-granularity Association Learning for On-the-fly Fine-grained Sketch-based Image Retrieval
    Dai, Dawei
    Tang, Xiaoyu
    Liu, Yingge
    Xia, Shuyin
    Wang, Guoyin
    KNOWLEDGE-BASED SYSTEMS, 2022, 253
  • [9] Multi-granularity Locking in Hierarchies with Synergistic Hierarchical and Fine-Grained Locks
    Ganesh, K.
    Kalikar, Saurabh
    Nasre, Rupesh
    EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 546 - 559
  • [10] Fine-grained recognition: Multi-granularity labels and category similarity matrix
    Shu, Xin
    Zhang, Lei
    Wang, Zizhou
    Wang, Lituan
    Yi, Zhang
    KNOWLEDGE-BASED SYSTEMS, 2023, 273