An Unsupervised Video Summarization Method Based on Multimodal Representation

被引:0
|
作者
Lei, Zhuo [1 ,2 ]
Yu, Qiang [1 ]
Shou, Lidan [2 ]
Li, Shengquan [1 ]
Mao, Yunqing [1 ]
机构
[1] City Cloud Technol China Co Ltd, Hangzhou, Peoples R China
[2] Zhejiang Univ, Hangzhou, Peoples R China
关键词
Video Summarization; Multi-modal Representation Learning; Unsupervised Learning;
D O I
10.1007/978-981-99-4761-4_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A good video summary should convey the whole story and feature the most important content. However, the importance of video content is often subjective, and users should have the option to personalize the summary by using natural language to specify what is important to them. Moreover, existing methods usually apply only visual cues to solve generic video summarization tasks, while this work introduces a single unsupervised multi-modal framework for addressing both generic and query-focused video summarization. We use a multi-head attention model to represent the multi-modal feature. We apply a Transformer-based model to learn the frame scores based on their representative, diversity and reconstruction losses. Especially, we develop a novel representative loss to train the model based on both visual and semantic information. We outperform previous state-of-the-art work with superior results on both generic and query-focused video summarization datasets.
引用
收藏
页码:171 / 180
页数:10
相关论文
共 50 条
  • [31] Unsupervised Video Summarization via Dynamic Modeling-based Hierarchical Clustering
    Mahmoud, Karim M.
    Ghanem, Nagia M.
    Ismail, Mohamed A.
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 2, 2013, : 303 - 308
  • [32] Using independently recurrent networks for reinforcement learning based unsupervised video summarization
    Gokhan Yaliniz
    Nazli Ikizler-Cinbis
    Multimedia Tools and Applications, 2021, 80 : 17827 - 17847
  • [33] Deep Semantic and Attentive Network for Unsupervised Video Summarization
    Zhong, Sheng-Hua
    Lin, Jingxu
    Lu, Jianglin
    Fares, Ahmed
    Ren, Tongwei
    ACM Transactions on Multimedia Computing, Communications and Applications, 2022, 18 (02)
  • [34] Unsupervised video summarization via clustering validity index
    Zhao, Ye
    Guo, Yanrong
    Sun, Rui
    Liu, Zhengqiong
    Guo, Dan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 33417 - 33430
  • [35] Joint Reinforcement and Contrastive Learning for Unsupervised Video Summarization
    Zhang, Yunzuo
    Liu, Yameng
    Zhu, Pengfei
    Kang, Weili
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2587 - 2591
  • [36] Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
    Pang, Zongshang
    Nakashima, Yuta
    Otani, Mayu
    Nagahara, Hajime
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2009 - 2018
  • [37] Unsupervised Video Summarization with Independently Recurrent Neural Networks
    Yaliniz, Gokhan
    Ikizler-Cinbis, Nazli
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [38] An Aesthetic-Driven Approach to Unsupervised Video Summarization
    Huang, Hongben
    Wu, Zaiqun
    Pang, Guangyao
    Xie, Jiehang
    IEEE ACCESS, 2024, 12 : 128768 - 128777
  • [39] Unsupervised learning of visual and semantic features for video summarization
    Huang, Yansen
    Zhong, Rui
    Yao, Wenjin
    Wang, Rui
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [40] SPARSE UNSUPERVISED CLUSTERING WITH MIXTURE OBSERVATIONS FOR VIDEO SUMMARIZATION
    Xiang, Xiang
    Tran, Dung N.
    Tran, Trac D.
    2017 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2017,