Video Summarization Based on Multimodal Features

被引:0
|
作者
Zhang, Yu [1 ]
Liu, Ju [2 ]
Liu, Xiaoxi [1 ]
Gao, Xuesong [3 ]
机构
[1] Shandong Univ, Informat & Commun Engn, Qingdao, Peoples R China
[2] Shandong Univ, Dept Elect Engn, Qingdao, Peoples R China
[3] Hisense Grp, Qingdao, Peoples R China
关键词
Feature Fusion; Information Science; LSTM; Multimedia Processing; Multimodal Features; Video Summarization;
D O I
10.4018/IJMDEM.2020100104
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this manuscript, the authors present a keyshots-based supervised video summarization method, where feature fusion and LSTM networks are used for summarization. The framework can be divided into three folds: 1) The authors formulate video summarization as a sequence to sequence problem, which should predict the importance score of video content based on video feature sequence. 2) By simultaneously considering visual features and textual features, the authors present the deep fusion multimodal features and summarize videos based on recurrent encoder-decoder architecture with bi-directional LSTM. 3) Most importantly, in order to train the supervised video summarization framework, the authors adopt the number of users who decided to select current video clip in their final video summary as the importance scores and ground truth. Comparisons are performed with the state-of-the-art methods and different variants of FLSum and T-FLSum. The results of F-score and rank correlation coefficients on TVSum and SumMe shows the outstanding performance of the method proposed in this manuscript.
引用
收藏
页码:60 / 76
页数:17
相关论文
共 50 条
  • [21] Video Shot Detection based on SIFT Features and Video Summarization using Expectation-Maximization
    Majumdar, Jharna
    Awale, Manish
    Kumar, Santhosh K. L.
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1033 - 1037
  • [22] MTIDNET: A MULTIMODAL TEMPORAL INTEREST DETECTION NETWORK FOR VIDEO SUMMARIZATION
    Tian, Xiaoyan
    Jin, Ye
    Zhang, Zhao
    Liu, Peng
    Tang, Xianglong
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2740 - 2744
  • [23] Multimodal Video Summarization via Time-Aware Transformers
    Shang, Xindi
    Yuan, Zehuan
    Wang, Anran
    Wang, Changhu
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1756 - 1765
  • [24] Video Emotion Recognition in the Wild Based on Fusion of Multimodal Features
    Chen, Shizhe
    Li, Xinrui
    Jin, Qin
    Zhang, Shilei
    Qin, Yong
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 494 - 500
  • [25] Topic-aware video summarization using multimodal transformer
    Zhu, Yubo
    Zhao, Wentian
    Hua, Rui
    Wu, Xinxiao
    PATTERN RECOGNITION, 2023, 140
  • [26] Action based Video Summarization
    Raksha, H.
    Namitha, G.
    Sejal, N.
    PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 457 - 462
  • [27] A static video summarization method based on the sparse coding of features and representativeness of frames
    Dong-ju Jeong
    Hyoung Jin Yoo
    Nam Ik Cho
    EURASIP Journal on Image and Video Processing, 2017
  • [28] A static video summarization method based on the sparse coding of features and representativeness of frames
    Jeong, Dong-ju
    Yoo, Hyoung Jin
    Cho, Nam Ik
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2016,
  • [29] Improved Delaunay Graph Based Video Summarization with Semantic Features and Canonical Correlation
    Kuanar, Sanjay K.
    Chowdhury, Ananda S.
    2015 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2015, : 155 - +
  • [30] TRANSACTION CLUES BASED VIDEO SUMMARIZATION USING BY SPEEDED UP ROBUST FEATURES
    Brindha, N.
    Amitha, T.
    2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,