Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context

被引:16
|
作者
Li, Yuanning [1 ,2 ]
Tian, Yonghong [3 ]
Duan, Ling-Yu [3 ]
Yang, Jingjing [1 ,2 ]
Huang, Tiejun [3 ]
Gao, Wen [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100080, Peoples R China
[2] Chinese Acad Sci, Grad Sch, Beijing 100080, Peoples R China
[3] Peking Univ, Natl Engn Lab Video Technol, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
关键词
Sequence multi-labeling; spatial correlation; temporal correlation; video annotation; CONCEPT FUSION; FRAMEWORK;
D O I
10.1109/TMM.2010.2066960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic video annotation is a challenging yet important problem for content-based video indexing and retrieval. In most existing works, annotation is formulated as a multi-labeling problem over individual shots. However, video is by nature informative in spatial and temporal context of semantic concepts. In this paper, we formulate video annotation as a sequence multi-labeling (SML) problem over a shot sequence. Different from many video annotation paradigms working on individual shots, SML aims to predict a multi-label sequence for consecutive shots in a global optimization manner by incorporating spatial and temporal context into a unified learning framework. A novel discriminative method, called sequence multi-label support vector machine (SVMSML), is accordingly proposed to infer the multi-label sequence for a given shot sequence. In (SVMSML), a joint kernel is employed to model the feature-level and concept-level context relationships (i.e., the dependencies of concepts on the low-level features, spatial and temporal correlations of concepts). A multiple-kernel learning (MKL) algorithm is developed to optimize the kernel weights of the joint kernel as well as the SML score function. To efficiently search the desirable multi-label sequence over the large output space in both training and test phases, we adopt an approximate method to maximize the energy of a binary Markov random field (BMRF). Extensive experiments on TRECVID'05 and TRECVID'07 datasets have shown that our proposed (SVMSML) gains superior performance over the state-of-the-art.
引用
收藏
页码:814 / 828
页数:15
相关论文
共 50 条
  • [31] Spatial-Temporal Sequence Attention Based Efficient Transformer for Video Snow Removal
    Gao, Tao
    Zhang, Qianxi
    Chen, Ting
    Wen, Yuanbo
    BIG DATA MINING AND ANALYTICS, 2025, 8 (03): : 551 - 562
  • [32] Joint learning of video scene detection and annotation via multi-modal adaptive context network
    Xu, Yifei
    Pan, Litong
    Sang, Weiguang
    Luo, Hailun
    Li, Li
    Wei, Pingping
    Zhu, Li
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [33] Spatial and temporal data parallelization of multi-view video encoding algorithm
    Pang, Yi
    Sun, Lifeng
    Guo, Songliu
    Yang, Shiqiang
    2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 441 - 444
  • [34] Spatial-Temporal Multi-level Association for Video Object Segmentation
    Miao, Deshui
    Li, Xin
    He, Zhenyu
    Lu, Huchuan
    Yang, Ming-Hsuan
    COMPUTER VISION - ECCV 2024, PT LXVII, 2025, 15125 : 91 - 107
  • [35] Modified key sequence-based video watermarking scheme resistant to temporal synchronization attacks
    Dong, Jing
    Huang, Hua
    Zhou, Quan
    Qi, Chun
    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 1698 - 1702
  • [36] SSTtrack: A unified hyperspectral video tracking framework via modeling spectral-spatial-temporal conditions
    Chen, Yuzeng
    Yuan, Qiangqiang
    Tang, Yuqi
    Xiao, Yi
    He, Jiang
    Han, Te
    Liu, Zhenqi
    Zhang, Liangpei
    INFORMATION FUSION, 2025, 114
  • [37] Temporal and Spatial Coherent Pulse Combining by Multi-path Interferometric Scheme
    Jang, Jin
    Jeong, Hee Won
    Joo, Ki-Nam
    INTERNATIONAL JOURNAL OF PRECISION ENGINEERING AND MANUFACTURING, 2019, 20 (01) : 93 - 100
  • [38] Temporal and Spatial Coherent Pulse Combining by Multi-path Interferometric Scheme
    Jin Jang
    Hee Won Jeong
    Ki-Nam Joo
    International Journal of Precision Engineering and Manufacturing, 2019, 20 : 93 - 100
  • [39] Context-dependent Viewpoint Sequence Recommendation System for Multi-view Video
    Wang, Xueting
    Muramatu, Yuki
    Hirayama, Takatsugu
    Mase, Kenji
    2014 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2014, : 195 - 202
  • [40] Semantic propagation network with robust spatial context descriptors for multi-class object labeling
    Wei, Ping
    Liu, Yuehu
    Zheng, Nanning
    Zhai, Shaozhuo
    NEURAL COMPUTING & APPLICATIONS, 2014, 24 (05): : 1003 - 1018