Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context

被引:16
|
作者
Li, Yuanning [1 ,2 ]
Tian, Yonghong [3 ]
Duan, Ling-Yu [3 ]
Yang, Jingjing [1 ,2 ]
Huang, Tiejun [3 ]
Gao, Wen [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100080, Peoples R China
[2] Chinese Acad Sci, Grad Sch, Beijing 100080, Peoples R China
[3] Peking Univ, Natl Engn Lab Video Technol, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
关键词
Sequence multi-labeling; spatial correlation; temporal correlation; video annotation; CONCEPT FUSION; FRAMEWORK;
D O I
10.1109/TMM.2010.2066960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic video annotation is a challenging yet important problem for content-based video indexing and retrieval. In most existing works, annotation is formulated as a multi-labeling problem over individual shots. However, video is by nature informative in spatial and temporal context of semantic concepts. In this paper, we formulate video annotation as a sequence multi-labeling (SML) problem over a shot sequence. Different from many video annotation paradigms working on individual shots, SML aims to predict a multi-label sequence for consecutive shots in a global optimization manner by incorporating spatial and temporal context into a unified learning framework. A novel discriminative method, called sequence multi-label support vector machine (SVMSML), is accordingly proposed to infer the multi-label sequence for a given shot sequence. In (SVMSML), a joint kernel is employed to model the feature-level and concept-level context relationships (i.e., the dependencies of concepts on the low-level features, spatial and temporal correlations of concepts). A multiple-kernel learning (MKL) algorithm is developed to optimize the kernel weights of the joint kernel as well as the SML score function. To efficiently search the desirable multi-label sequence over the large output space in both training and test phases, we adopt an approximate method to maximize the energy of a binary Markov random field (BMRF). Extensive experiments on TRECVID'05 and TRECVID'07 datasets have shown that our proposed (SVMSML) gains superior performance over the state-of-the-art.
引用
收藏
页码:814 / 828
页数:15
相关论文
共 50 条
  • [41] Semantic propagation network with robust spatial context descriptors for multi-class object labeling
    Ping Wei
    Yuehu Liu
    Nanning Zheng
    Shaozhuo Zhai
    Neural Computing and Applications, 2014, 24 : 1003 - 1018
  • [42] E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context
    Li, Zizhang
    Wang, Mengmeng
    Pi, Huaijin
    Xu, Kechun
    Mei, Jianbiao
    Liu, Yong
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 267 - 284
  • [43] TRANSTL: SPATIAL-TEMPORAL LOCALIZATION TRANSFORMER FOR MULTI-LABEL VIDEO CLASSIFICATION
    Wu, Hongjun
    Li, Mengzhu
    Liu, Yongcheng
    Liu, Hongzhe
    Xu, Cheng
    Li, Xuewei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1965 - 1969
  • [44] Spatial-temporal segmentation scheme for object-oriented video coding based on wavelet and MMRF
    Zheng, L
    Chan, AK
    Liu, JC
    WAVELET APPLICATIONS IN SIGNAL AND IMAGE PROCESSING VII, 1999, 3813 : 822 - 831
  • [45] On a self-recovery digital watermarking scheme robust against spatial and temporal attacks on compressed video
    Garcia-Hernandez, Jose Juan
    Briones-Segovia, Victor Alejandro
    Feregrino-Uribe, Claudia
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (17): : 13482 - 13508
  • [46] Query Temporal Context Modeling and Multi-Modal Intent for Efficient Video Content Retrieval
    Singh, Pratibha
    Kushwaha, Alok Kumar Singh
    NATIONAL ACADEMY SCIENCE LETTERS-INDIA, 2025,
  • [47] A Hierarchical Spatial-Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning
    Teng, Xiaoyu
    Gui, Xiaolin
    Xu, Pan
    Tong, Jianglei
    An, Jian
    Liu, Yang
    Jiang, Huilan
    SENSORS, 2022, 22 (21)
  • [48] CAMERA CONTEXT BASED ESTIMATION OF SPATIAL AND TEMPORAL ACTIVITY PARAMETERS FOR VIDEO QUALITY METRICS IN AUTOMOTIVE APPLICATIONS
    Lottermann, Christian
    Machado, Alexander
    Schroeder, Damien
    Hintermaier, Wolfgang
    Steinbach, Eckehard
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2014,
  • [49] A Novel Spatial and Temporal Context-Aware Approach for Drone-Based Video Object Detection
    Pi, Zhaoliang
    Lian, Yanchao
    Chen, Xier
    Wu, Yinan
    Li, Yingping
    Jiao, Licheng
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 179 - 188
  • [50] Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description
    Shen, Kai
    Wu, Lingfei
    Xu, Fangli
    Tang, Siliang
    Xiao, Jun
    Zhuang, Yueting
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 941 - 947