Video Summarization Based on Multimodal Features

被引：0

作者：

Zhang, Yu ^{[1
]}

Liu, Ju ^{[2
]}

Liu, Xiaoxi ^{[1
]}

Gao, Xuesong ^{[3
]}

机构：

[1] Shandong Univ, Informat & Commun Engn, Qingdao, Peoples R China

[2] Shandong Univ, Dept Elect Engn, Qingdao, Peoples R China

[3] Hisense Grp, Qingdao, Peoples R China

来源：

INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT | 2020年 / 11卷 / 04期

关键词：

Feature Fusion; Information Science; LSTM; Multimedia Processing; Multimodal Features; Video Summarization;

D O I：

10.4018/IJMDEM.2020100104

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In this manuscript, the authors present a keyshots-based supervised video summarization method, where feature fusion and LSTM networks are used for summarization. The framework can be divided into three folds: 1) The authors formulate video summarization as a sequence to sequence problem, which should predict the importance score of video content based on video feature sequence. 2) By simultaneously considering visual features and textual features, the authors present the deep fusion multimodal features and summarize videos based on recurrent encoder-decoder architecture with bi-directional LSTM. 3) Most importantly, in order to train the supervised video summarization framework, the authors adopt the number of users who decided to select current video clip in their final video summary as the importance scores and ground truth. Comparisons are performed with the state-of-the-art methods and different variants of FLSum and T-FLSum. The results of F-score and rank correlation coefficients on TVSum and SumMe shows the outstanding performance of the method proposed in this manuscript.

引用

页码：60 / 76

页数：17

共 50 条

[21] Video Shot Detection based on SIFT Features and Video Summarization using Expectation-Maximization
Majumdar, Jharna
Awale, Manish
Kumar, Santhosh K. L.
2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1033 - 1037
[22] MTIDNET: A MULTIMODAL TEMPORAL INTEREST DETECTION NETWORK FOR VIDEO SUMMARIZATION
Tian, Xiaoyan
Jin, Ye
Zhang, Zhao
Liu, Peng
Tang, Xianglong
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2740 - 2744
[23] Multimodal Video Summarization via Time-Aware Transformers
Shang, Xindi
Yuan, Zehuan
Wang, Anran
Wang, Changhu
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1756 - 1765
[24] Video Emotion Recognition in the Wild Based on Fusion of Multimodal Features
Chen, Shizhe
Li, Xinrui
Jin, Qin
Zhang, Shilei
Qin, Yong
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 494 - 500
[25] Topic-aware video summarization using multimodal transformer
Zhu, Yubo
Zhao, Wentian
Hua, Rui
Wu, Xinxiao
PATTERN RECOGNITION, 2023, 140
[26] Action based Video Summarization
Raksha, H.
Namitha, G.
Sejal, N.
PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 457 - 462
[27] A static video summarization method based on the sparse coding of features and representativeness of frames
Dong-ju Jeong
Hyoung Jin Yoo
Nam Ik Cho
EURASIP Journal on Image and Video Processing, 2017
[28] A static video summarization method based on the sparse coding of features and representativeness of frames
Jeong, Dong-ju
Yoo, Hyoung Jin
Cho, Nam Ik
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2016,
[29] Improved Delaunay Graph Based Video Summarization with Semantic Features and Canonical Correlation
Kuanar, Sanjay K.
Chowdhury, Ananda S.
2015 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2015, : 155 - +
[30] TRANSACTION CLUES BASED VIDEO SUMMARIZATION USING BY SPEEDED UP ROBUST FEATURES
Brindha, N.
Amitha, T.
2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,

← 1 2 3 4 5 →