Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context

被引:16
|
作者
Li, Yuanning [1 ,2 ]
Tian, Yonghong [3 ]
Duan, Ling-Yu [3 ]
Yang, Jingjing [1 ,2 ]
Huang, Tiejun [3 ]
Gao, Wen [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100080, Peoples R China
[2] Chinese Acad Sci, Grad Sch, Beijing 100080, Peoples R China
[3] Peking Univ, Natl Engn Lab Video Technol, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
关键词
Sequence multi-labeling; spatial correlation; temporal correlation; video annotation; CONCEPT FUSION; FRAMEWORK;
D O I
10.1109/TMM.2010.2066960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic video annotation is a challenging yet important problem for content-based video indexing and retrieval. In most existing works, annotation is formulated as a multi-labeling problem over individual shots. However, video is by nature informative in spatial and temporal context of semantic concepts. In this paper, we formulate video annotation as a sequence multi-labeling (SML) problem over a shot sequence. Different from many video annotation paradigms working on individual shots, SML aims to predict a multi-label sequence for consecutive shots in a global optimization manner by incorporating spatial and temporal context into a unified learning framework. A novel discriminative method, called sequence multi-label support vector machine (SVMSML), is accordingly proposed to infer the multi-label sequence for a given shot sequence. In (SVMSML), a joint kernel is employed to model the feature-level and concept-level context relationships (i.e., the dependencies of concepts on the low-level features, spatial and temporal correlations of concepts). A multiple-kernel learning (MKL) algorithm is developed to optimize the kernel weights of the joint kernel as well as the SML score function. To efficiently search the desirable multi-label sequence over the large output space in both training and test phases, we adopt an approximate method to maximize the energy of a binary Markov random field (BMRF). Extensive experiments on TRECVID'05 and TRECVID'07 datasets have shown that our proposed (SVMSML) gains superior performance over the state-of-the-art.
引用
收藏
页码:814 / 828
页数:15
相关论文
共 50 条
  • [21] An Adaptive Scheme for Compressed Video Steganography Using Temporal and Spatial Features of the Video Signal
    Mansouri, Jafar
    Khademi, Morteza
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2009, 19 (04) : 306 - 315
  • [22] AdaFocusV3: On Unified Spatial-Temporal Dynamic Video Recognition
    Wang, Yulin
    Yue, Yang
    Xu, Xinhong
    Hassani, Ali
    Kulikov, Victor
    Orlov, Nikita
    Song, Shiji
    Shi, Humphrey
    Huang, Gao
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 226 - 243
  • [23] Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
    Yan, Shilin
    Zhang, Renrui
    Guo, Ziyu
    Chen, Wenchao
    Zhang, Wei
    Li, Hongyang
    Qiao, Yu
    Dong, Hao
    He, Zhongjiang
    Gao, Peng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6449 - 6457
  • [24] Exploiting spatial-temporal context for trajectory based action video retrieval
    Lelin Zhang
    Zhiyong Wang
    Tingting Yao
    Shin’ichi Staoh
    Tao Mei
    David Dagan Feng
    Multimedia Tools and Applications, 2018, 77 : 2057 - 2081
  • [25] Exploiting spatial-temporal context for trajectory based action video retrieval
    Zhang, Lelin
    Wang, Zhiyong
    Yao, Tingting
    Staoh, Shin'ichi
    Mei, Tao
    Feng, David Dagan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (02) : 2057 - 2081
  • [26] Spatial-Temporal Color Video Reconstruction From Noisy CFA Sequence
    Zhang, Lei
    Dong, Weisheng
    Wu, Xiaolin
    Shi, Guangming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2010, 20 (06) : 838 - 847
  • [27] Spatial-temporal error detection scheme for video transmission over noisy channels
    Wu, Guan-Lin
    Chien, Shao-Yi
    ISM 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2007, : 78 - +
  • [28] Spatial-temporal consistent labeling for multi-camera multi-object surveillance systems
    Chang, Jing-Ying
    Wang, Tzu-Heng
    Chien, Shao-Yi
    Chen, Liang-Gee
    PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10, 2008, : 3530 - +
  • [29] SELF-LEARNED VIDEO SUPER-RESOLUTION WITH AUGMENTED SPATIAL AND TEMPORAL CONTEXT
    Fan, Zejia
    Liu, Jiaying
    Yang, Wenhan
    Xiang, Wei
    Guo, Zongming
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1925 - 1929
  • [30] Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction on Monocular RGB Video
    Zhao, Weichao
    Hu, Hezhen
    Zhou, Wengang
    Li, Li
    Li, Houqiang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)