Video Summarization with Long Short-Term Memory

被引:411
|
作者
Zhang, Ke [1 ]
Chao, Wei-Lun [1 ]
Sha, Fei [2 ]
Grauman, Kristen [3 ]
机构
[1] Univ Southern Calif, Dept Comp Sci, Los Angeles, CA 90007 USA
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
[3] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
来源
关键词
Video summarization; Long short-term memory; SPEECH RECOGNITION;
D O I
10.1007/978-3-319-46478-7_47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel supervised learning technique for summarizing videos by automatically selecting keyframes or key subshots. Casting the task as a structured prediction problem, our main idea is to use Long Short-Term Memory (LSTM) to model the variable-range temporal dependency among video frames, so as to derive both representative and compact video summaries. The proposed model successfully accounts for the sequential structure crucial to generating meaningful video summaries, leading to state-of-the-art results on two benchmark datasets. In addition to advances in modeling techniques, we introduce a strategy to address the need for a large amount of annotated data for training complex learning approaches to summarization. There, our main idea is to exploit auxiliary annotated video summarization datasets, in spite of their heterogeneity in visual styles and contents. Specifically, we show that domain adaptation techniques can improve learning by reducing the discrepancies in the original datasets' statistical properties.
引用
收藏
页码:766 / 782
页数:17
相关论文
共 50 条