Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context

被引:16
|
作者
Li, Yuanning [1 ,2 ]
Tian, Yonghong [3 ]
Duan, Ling-Yu [3 ]
Yang, Jingjing [1 ,2 ]
Huang, Tiejun [3 ]
Gao, Wen [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100080, Peoples R China
[2] Chinese Acad Sci, Grad Sch, Beijing 100080, Peoples R China
[3] Peking Univ, Natl Engn Lab Video Technol, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
关键词
Sequence multi-labeling; spatial correlation; temporal correlation; video annotation; CONCEPT FUSION; FRAMEWORK;
D O I
10.1109/TMM.2010.2066960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic video annotation is a challenging yet important problem for content-based video indexing and retrieval. In most existing works, annotation is formulated as a multi-labeling problem over individual shots. However, video is by nature informative in spatial and temporal context of semantic concepts. In this paper, we formulate video annotation as a sequence multi-labeling (SML) problem over a shot sequence. Different from many video annotation paradigms working on individual shots, SML aims to predict a multi-label sequence for consecutive shots in a global optimization manner by incorporating spatial and temporal context into a unified learning framework. A novel discriminative method, called sequence multi-label support vector machine (SVMSML), is accordingly proposed to infer the multi-label sequence for a given shot sequence. In (SVMSML), a joint kernel is employed to model the feature-level and concept-level context relationships (i.e., the dependencies of concepts on the low-level features, spatial and temporal correlations of concepts). A multiple-kernel learning (MKL) algorithm is developed to optimize the kernel weights of the joint kernel as well as the SML score function. To efficiently search the desirable multi-label sequence over the large output space in both training and test phases, we adopt an approximate method to maximize the energy of a binary Markov random field (BMRF). Extensive experiments on TRECVID'05 and TRECVID'07 datasets have shown that our proposed (SVMSML) gains superior performance over the state-of-the-art.
引用
收藏
页码:814 / 828
页数:15
相关论文
共 50 条
  • [1] EMPIRICAL ANALYSIS OF MULTI-LABELING ALGORITHMS FOR MUSIC EMOTION ANNOTATION
    Su, Ja-Hwung
    Tsai, Yi-Cheng
    Tseng, Vincent S.
    ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
  • [2] Image Annotation with Parametric Mixture Model Based Multi-class Multi-labeling
    Wang, Zhiyong
    Siu, Wan-Chi
    Feng, Dagan
    2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 635 - 638
  • [3] Combination of classification and regression in decision tree for multi-labeling image annotation and retrieval
    Fakhari, Ali
    Moghadam, Amir Masoud Eftekhari
    APPLIED SOFT COMPUTING, 2013, 13 (02) : 1292 - 1302
  • [4] A temporal context model for boosting video annotation
    YI Jian
    PENG YuXin
    XIAO JianGuo
    Science China(Information Sciences), 2013, 56 (11) : 92 - 105
  • [5] A temporal context model for boosting video annotation
    Jian Yi
    YuXin Peng
    JianGuo Xiao
    Science China Information Sciences, 2013, 56 : 1 - 14
  • [6] A temporal context model for boosting video annotation
    Yi Jian
    Peng YuXin
    Xiao JianGuo
    SCIENCE CHINA-INFORMATION SCIENCES, 2013, 56 (11) : 1 - 14
  • [7] Spatial Role Labeling: Task Definition and Annotation Scheme
    Kordjamshidi, Parisa
    Van Otterlo, Martijn
    Moens, Marie-Francine
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [8] Video Sequence Boundary Labeling with Temporal Coherence
    Bobak, Petr
    Cmolik, Ladislav
    Cadik, Martin
    ADVANCES IN COMPUTER GRAPHICS, CGI 2019, 2019, 11542 : 40 - 52
  • [9] Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal Fusing
    Li, Wei
    Abtahi, Farnaz
    Zhu, Zhigang
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6766 - 6775
  • [10] Moving target detection and labeling in video sequence based on spatial-temporal information fusion
    Ma, Shiwei
    Liu, Zhongjie
    Yang, Banghua
    Wang, Jian
    BIO-INSPIRED COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2007, 4688 : 795 - 802