Multi-sentence video captioning using spatial saliency of video frames and content-oriented beam search algorithm

被引:5
|
作者
Nabati, Masoomeh [1 ]
Behrad, Alireza [1 ]
机构
[1] Shahed Univ, Elect Engn Dept, Tehran 3319118651, Iran
关键词
Multi-sentence video captioning; Visual saliency; Beam searching; Deep neural network; ATTENTION; ARCHITECTURE; IMAGE; TEXT;
D O I
10.1016/j.eswa.2023.120454
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning algorithms aim at expressing the information and activities contained in a video clip in the form of lingual sentences. Most existing video captioning approaches have used only one sentence to describe the semantic content of a video. However, one sentence cannot transfer all the semantic information of a video, especially in videos with high informative content. Although a few studies have been conducted for multi-sentence video captioning, such as paragraph and dense captioning, they produce several sentences by focusing on different activities, objects, or temporal parts of a video. However, a video clip with a single object or activity may include a lot of information from different perspectives that can not be described by a single sen-tence, effectively. To counter the problem, we propose a multi-sentence video captioning algorithm using the spatial saliency of video frames as well as a content-oriented beam search algorithm. In the proposed algorithm, the spatial saliency of video frames is employed during the encoding stage to generate informative sentences by focusing on different parts of video frames. Furthermore, a content-oriented beam search algorithm is employed during the decoding stage to generate informative sentences. A multi-stage filter is also employed to remove the sentences with incorrect structure or sentences that are less relevant to the semantic content of the video. To evaluate the performance of the proposed algorithm, two well-known video description databases were used, and the results showed a significant improvement in the evaluation metrics, especially in the best-1 sentences. We also tested the proposed algorithm with several real-life movies.
引用
收藏
页数:18
相关论文
共 8 条
  • [1] Multi-Sentence Video Captioning using Content-oriented Beam Searching and Multi-stage Refining Algorithm
    Nabati, Masoomeh
    Behrad, Alireza
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
  • [2] Implicit and explicit commonsense for multi-sentence video captioning
    Chou, Shih-Han
    Little, James J.
    Sigal, Leonid
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 247
  • [3] On Generating Content-Oriented Geo Features for Sensor-Rich Outdoor Video Search
    Yin, Yifang
    Yu, Yi
    Zimmermann, Roger
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (10) : 1760 - 1772
  • [4] COVAD: Content-oriented video anomaly detection using a self attention-based deep learning model
    Wenhao SHAO
    Praboda RAJAPAKSHA
    Yanyan WEI
    Dun LI
    Noel CRESPI
    Zhigang LUO
    虚拟现实与智能硬件(中英文), 2023, 5 (01) : 24 - 41
  • [5] COVAD: Content-Oriented Video Anomaly Detection using a Self-Attention based Deep Learning Model
    Shao W.
    Rajapaksha P.
    Wei Y.
    Li D.
    Crespi N.
    Luo Z.
    Virtual Reality and Intelligent Hardware, 2023, 5 (01): : 24 - 41
  • [6] Algorithm combination of deblurring and denoising on video frames using the method search of local features on image
    Semenishchev, Evgeny
    XIII INTERNATIONAL SCIENTIFIC-TECHNICAL CONFERENCE DYNAMIC OF TECHNICAL SYSTEMS (DTS-2017), 2017, 132
  • [7] Selective encryption of video frames using the one-time random key algorithm and permutation techniques for secure transmission over the content delivery network
    Murari, T. Vijaya
    Ravishankar, K. C.
    Raghu, M. E.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (35) : 82303 - 82342
  • [8] An efficient field-programmable gate array-based hardware oriented block motion estimation algorithm based on diamond adaptive rood pattern search algorithm for multi-standard video codec
    Balamurugan, S. M.
    Seshasayanan, R.
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2021, 43 (16) : 3672 - 3685