Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning

被引：9

作者：

Yan, Yichao ^{[1
]}

Zhuang, Ning ^{[1
]}

Ni, Bingbing ^{[1
]}

Zhang, Jian ^{[1
]}

Xu, Minghao ^{[1
]}

Zhang, Qiang ^{[1
]}

Zheng, Zhang ^{[1
]}

Cheng, Shuo ^{[1
]}

Tian, Qi ^{[3
]}

Xu, Yi ^{[1
]}

Yang, Xiaokang ^{[2
]}

Zhang, Wenjun ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China

[2] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China

[3] Univ Texas San Antonio, San Antonio, TX 78249 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Video caption; representation learning; graphCNN; fine-grained; multiple granularity; SEGMENTATION;

D O I：

10.1109/TPAMI.2019.2946823

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6 K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.

引用

页码：666 / 683

页数：18

共 50 条

[1] Graph Neural Networks Based Multi-granularity Feature Representation Learning for Fine-Grained Visual Categorization
Wu, Hongyan
Guo, Haiyun
Miao, Qinghai
Huang, Min
Wang, Jinqiao
MULTIMEDIA MODELING, MMM 2022, PT II, 2022, 13142 : 230 - 242
[2] Fine-grained image recognition via trusted multi-granularity information fusion
Yu, Ying
Tang, Hong
Qian, Jin
Zhu, Zhiliang
Cai, Zhen
Lv, Jingqin
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (04) : 1105 - 1117
[3] Fine-grained image recognition via trusted multi-granularity information fusion
Ying Yu
Hong Tang
Jin Qian
Zhiliang Zhu
Zhen Cai
Jingqin Lv
International Journal of Machine Learning and Cybernetics, 2023, 14 : 1105 - 1117
[4] Multi-Granularity Feature Distillation Learning Network for Fine-Grained Visual Classification
Cai, Yuhang
Ke, Xiao
2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 300 - 303
[5] Efficient multi-granularity network for fine-grained image classification
Jiabao Wang
Yang Li
Hang Li
Xun Zhao
Rui Zhang
Zhuang Miao
Journal of Real-Time Image Processing, 2022, 19 : 853 - 866
[6] Efficient multi-granularity network for fine-grained image classification
Wang, Jiabao
Li, Yang
Li, Hang
Zhao, Xun
Zhang, Rui
Miao, Zhuang
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2022, 19 (05) : 853 - 866
[7] Graph convolutional network meta-learning with multi-granularity POS guidance for video captioning
Li, Ping
Zhang, Pan
Xu, Xianghua
NEUROCOMPUTING, 2022, 472 : 294 - 305
[8] Multi-granularity Association Learning for On-the-fly Fine-grained Sketch-based Image Retrieval
Dai, Dawei
Tang, Xiaoyu
Liu, Yingge
Xia, Shuyin
Wang, Guoyin
KNOWLEDGE-BASED SYSTEMS, 2022, 253
[9] Multi-granularity Locking in Hierarchies with Synergistic Hierarchical and Fine-Grained Locks
Ganesh, K.
Kalikar, Saurabh
Nasre, Rupesh
EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 546 - 559
[10] Fine-grained recognition: Multi-granularity labels and category similarity matrix
Shu, Xin
Zhang, Lei
Wang, Zizhou
Wang, Lituan
Yi, Zhang
KNOWLEDGE-BASED SYSTEMS, 2023, 273

← 1 2 3 4 5 →