Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context

被引：16

作者：

Li, Yuanning ^{[1
,2
]}

Tian, Yonghong ^{[3
]}

Duan, Ling-Yu ^{[3
]}

Yang, Jingjing ^{[1
,2
]}

Huang, Tiejun ^{[3
]}

Gao, Wen ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100080, Peoples R China

[2] Chinese Acad Sci, Grad Sch, Beijing 100080, Peoples R China

[3] Peking Univ, Natl Engn Lab Video Technol, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2010年 / 12卷 / 08期

关键词：

Sequence multi-labeling; spatial correlation; temporal correlation; video annotation; CONCEPT FUSION; FRAMEWORK;

D O I：

10.1109/TMM.2010.2066960

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic video annotation is a challenging yet important problem for content-based video indexing and retrieval. In most existing works, annotation is formulated as a multi-labeling problem over individual shots. However, video is by nature informative in spatial and temporal context of semantic concepts. In this paper, we formulate video annotation as a sequence multi-labeling (SML) problem over a shot sequence. Different from many video annotation paradigms working on individual shots, SML aims to predict a multi-label sequence for consecutive shots in a global optimization manner by incorporating spatial and temporal context into a unified learning framework. A novel discriminative method, called sequence multi-label support vector machine (SVMSML), is accordingly proposed to infer the multi-label sequence for a given shot sequence. In (SVMSML), a joint kernel is employed to model the feature-level and concept-level context relationships (i.e., the dependencies of concepts on the low-level features, spatial and temporal correlations of concepts). A multiple-kernel learning (MKL) algorithm is developed to optimize the kernel weights of the joint kernel as well as the SML score function. To efficiently search the desirable multi-label sequence over the large output space in both training and test phases, we adopt an approximate method to maximize the energy of a binary Markov random field (BMRF). Extensive experiments on TRECVID'05 and TRECVID'07 datasets have shown that our proposed (SVMSML) gains superior performance over the state-of-the-art.

引用

页码：814 / 828

页数：15

共 50 条

[21] An Adaptive Scheme for Compressed Video Steganography Using Temporal and Spatial Features of the Video Signal
Mansouri, Jafar
Khademi, Morteza
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2009, 19 (04) : 306 - 315
[22] AdaFocusV3: On Unified Spatial-Temporal Dynamic Video Recognition
Wang, Yulin
Yue, Yang
Xu, Xinhong
Hassani, Ali
Kulikov, Victor
Orlov, Nikita
Song, Shiji
Shi, Humphrey
Huang, Gao
COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 226 - 243
[23] Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Yan, Shilin
Zhang, Renrui
Guo, Ziyu
Chen, Wenchao
Zhang, Wei
Li, Hongyang
Qiao, Yu
Dong, Hao
He, Zhongjiang
Gao, Peng
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6449 - 6457
[24] Exploiting spatial-temporal context for trajectory based action video retrieval
Lelin Zhang
Zhiyong Wang
Tingting Yao
Shin’ichi Staoh
Tao Mei
David Dagan Feng
Multimedia Tools and Applications, 2018, 77 : 2057 - 2081
[25] Exploiting spatial-temporal context for trajectory based action video retrieval
Zhang, Lelin
Wang, Zhiyong
Yao, Tingting
Staoh, Shin'ichi
Mei, Tao
Feng, David Dagan
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (02) : 2057 - 2081
[26] Spatial-Temporal Color Video Reconstruction From Noisy CFA Sequence
Zhang, Lei
Dong, Weisheng
Wu, Xiaolin
Shi, Guangming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2010, 20 (06) : 838 - 847
[27] Spatial-temporal error detection scheme for video transmission over noisy channels
Wu, Guan-Lin
Chien, Shao-Yi
ISM 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2007, : 78 - +
[28] Spatial-temporal consistent labeling for multi-camera multi-object surveillance systems
Chang, Jing-Ying
Wang, Tzu-Heng
Chien, Shao-Yi
Chen, Liang-Gee
PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10, 2008, : 3530 - +
[29] SELF-LEARNED VIDEO SUPER-RESOLUTION WITH AUGMENTED SPATIAL AND TEMPORAL CONTEXT
Fan, Zejia
Liu, Jiaying
Yang, Wenhan
Xiang, Wei
Guo, Zongming
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1925 - 1929
[30] Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction on Monocular RGB Video
Zhao, Weichao
Hu, Hezhen
Zhou, Wengang
Li, Li
Li, Houqiang
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)

← 1 2 3 4 5 →