An Unsupervised Video Summarization Method Based on Multimodal Representation

被引：0

作者：

Lei, Zhuo ^{[1
,2
]}

Yu, Qiang ^{[1
]}

Shou, Lidan ^{[2
]}

Li, Shengquan ^{[1
]}

Mao, Yunqing ^{[1
]}

机构：

[1] City Cloud Technol China Co Ltd, Hangzhou, Peoples R China

[2] Zhejiang Univ, Hangzhou, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT V | 2023年 / 14090卷

关键词：

Video Summarization; Multi-modal Representation Learning; Unsupervised Learning;

D O I：

10.1007/978-981-99-4761-4_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A good video summary should convey the whole story and feature the most important content. However, the importance of video content is often subjective, and users should have the option to personalize the summary by using natural language to specify what is important to them. Moreover, existing methods usually apply only visual cues to solve generic video summarization tasks, while this work introduces a single unsupervised multi-modal framework for addressing both generic and query-focused video summarization. We use a multi-head attention model to represent the multi-modal feature. We apply a Transformer-based model to learn the frame scores based on their representative, diversity and reconstruction losses. Especially, we develop a novel representative loss to train the model based on both visual and semantic information. We outperform previous state-of-the-art work with superior results on both generic and query-focused video summarization datasets.

引用

页码：171 / 180

页数：10

共 50 条

[31] Unsupervised Video Summarization via Dynamic Modeling-based Hierarchical Clustering
Mahmoud, Karim M.
Ghanem, Nagia M.
Ismail, Mohamed A.
2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 2, 2013, : 303 - 308
[32] Using independently recurrent networks for reinforcement learning based unsupervised video summarization
Gokhan Yaliniz
Nazli Ikizler-Cinbis
Multimedia Tools and Applications, 2021, 80 : 17827 - 17847
[33] Deep Semantic and Attentive Network for Unsupervised Video Summarization
Zhong, Sheng-Hua
Lin, Jingxu
Lu, Jianglin
Fares, Ahmed
Ren, Tongwei
ACM Transactions on Multimedia Computing, Communications and Applications, 2022, 18 (02)
[34] Unsupervised video summarization via clustering validity index
Zhao, Ye
Guo, Yanrong
Sun, Rui
Liu, Zhengqiong
Guo, Dan
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 33417 - 33430
[35] Joint Reinforcement and Contrastive Learning for Unsupervised Video Summarization
Zhang, Yunzuo
Liu, Yameng
Zhu, Pengfei
Kang, Weili
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2587 - 2591
[36] Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
Pang, Zongshang
Nakashima, Yuta
Otani, Mayu
Nagahara, Hajime
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2009 - 2018
[37] Unsupervised Video Summarization with Independently Recurrent Neural Networks
Yaliniz, Gokhan
Ikizler-Cinbis, Nazli
2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
[38] An Aesthetic-Driven Approach to Unsupervised Video Summarization
Huang, Hongben
Wu, Zaiqun
Pang, Guangyao
Xie, Jiehang
IEEE ACCESS, 2024, 12 : 128768 - 128777
[39] Unsupervised learning of visual and semantic features for video summarization
Huang, Yansen
Zhong, Rui
Yao, Wenjin
Wang, Rui
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[40] SPARSE UNSUPERVISED CLUSTERING WITH MIXTURE OBSERVATIONS FOR VIDEO SUMMARIZATION
Xiang, Xiang
Tran, Dung N.
Tran, Trac D.
2017 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2017,

← 1 2 3 4 5 →