Visual Event Recognition in Videos by Learning from Web Data

被引：40

作者：

Duan, Lixin ^{[1
]}

Xu, Dong ^{[1
]}

Tsang, Ivor Wai-Hung ^{[1
]}

Luo, Jiebo ^{[2
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore

[2] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2012年 / 34卷 / 09期

关键词：

Event recognition; transfer learning; domain adaptation; cross-domain learning; adaptive MKL; aligned space-time pyramid matching; KERNEL; CONTEXT; IMAGES; SVM;

D O I：

10.1109/TPAMI.2011.265

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a visual event recognition framework for consumer videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). Observing that consumer videos generally contain large intraclass variations within the same type of events, we first propose a new method, called Aligned Space-Time Pyramid Matching (ASTPM), to measure the distance between any two video clips. Second, we propose a new transfer learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time features and static SIFT features) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web video domain and consumer video domain). For each pyramid level and each type of local features, we first train a set of SVM classifiers based on the combined training set from two domains by using multiple base kernels from different kernel types and parameters, which are then fused with equal weights to obtain a prelearned average classifier. In A-MKL, for each event class we learn an adapted target classifier based on multiple base kernels and the prelearned average classifiers from this event class or all the event classes by minimizing both the structural risk functional and the mismatch between data distributions of two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data. We also conduct an in-depth investigation on various aspects of the proposed method A-MKL, such as the analysis on the combination coefficients on the prelearned classifiers, the convergence of the learning algorithm, and the performance variation by using different proportions of labeled consumer videos. Moreover, we show that A-MKL using the prelearned classifiers from all the event classes leads to better performance when compared with A-MKL using the prelearned classifiers only from each individual event class.

引用

页码：1667 / 1680

页数：14

共 50 条

[41] Audio-Visual Event Localization in Unconstrained Videos
Tian, Yapeng
Shi, Jing
Li, Bochen
Duan, Zhiyao
Xu, Chenliang
COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 252 - 268
[42] InSocialNet: Interactive visual analytics for role—event videos
Yaohua Pan
Zhibin Niu
Jing Wu
Jiawan Zhang
Computational Visual Media, 2019, 5 : 375 - 390
[43] Object tracking and event recognition in biological microscopy videos
Shotton, DM
Rodríguez, A
Guil, N
Trelles, O
15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS: APPLICATIONS, ROBOTICS SYSTEMS AND ARCHITECTURES, 2000, : 226 - 229
[44] InSocialNet:Interactive visual analytics for role–event videos
Yaohua Pan
Zhibin Niu
Jing Wu
Jiawan Zhang
Computational Visual Media, 2019, 5 (04) : 375 - 390
[45] A Unified Framework for Tracking Based Text Detection and Recognition from Web Videos
Tian, Shu
Yin, Xu-Cheng
Su, Ya
Hao, Hong-Wei
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (03) : 542 - 554
[46] Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers
Nasirihaghighi, Sahar
Ghamsarian, Negin
Husslein, Heinrich
Schoeffmann, Klaus
MULTIMEDIA MODELING, MMM 2024, PT V, 2024, 14565 : 82 - 95
[47] High-level event recognition in unconstrained videos
Jiang, Yu-Gang
Bhattacharya, Subhabrata
Chang, Shih-Fu
Shah, Mubarak
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2013, 2 (02) : 73 - 101
[48] Structured Learning for Action Recognition in Videos
Long, Yinghan
Srinivasan, Gopalakrishnan
Panda, Priyadarshini
Roy, Kaushik
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (03) : 475 - 484
[49] Learning Selection of User Generated Event Videos
Bailer, Werner
Winter, Martin
Wechtitsch, Stefanie
PROCEEDINGS OF THE 15TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2017,
[50] Recognition of Adult Images, Videos, and Web Page Bags
Hu, Weiming
Zuo, Haiqiang
Wu, Ou
Chen, Yunfei
Zhang, Zhongfei
Suter, David
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2011, 7 (01)

← 1 2 3 4 5 →