Describing Video With Attention-Based Bidirectional LSTM

被引:181
|
作者
Bin, Yi [1 ,2 ]
Yang, Yang [1 ,2 ]
Shen, Fumin [1 ,2 ]
Xie, Ning [1 ,2 ]
Shen, Heng Tao [1 ,2 ]
Li, Xuelong [3 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu 611731, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China
[3] Chinese Acad Sci, Ctr Opt Imagery Anal & Learning, Xian Inst Opt & Precis Mech, State Key Lab Transient Opt & Photon, Xian 710119, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Bidirectional long-short term memory (BiLSTM); temporal attention; video captioning;
D O I
10.1109/TCYB.2018.2831447
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning has been attracting broad research attention in the multimedia community. However, most existing approaches heavily rely on static visual information or partially capture the local temporal knowledge (e.g., within 16 frames), thus hardly describing motions accurately from a global view. In this paper, we propose a novel video captioning framework, which integrates bidirectional long-short term memory (BiLSTM) and a soft attention mechanism to generate better global representations for videos as well as enhance the recognition of lasting motions in videos. To generate video captions, we exploit another long-short term memory as a decoder to fully explore global contextual information. The benefits of our proposed method are two fold: 1) the BiLSTM structure comprehensively preserves global temporal and visual information and 2) the soft attention mechanism enables a language decoder to recognize and focus on principle targets from the complex content. We verify the effectiveness of our proposed video captioning framework on two widely used benchmarks, that is, microsoft video description corpus and MSR-video to text, and the experimental results demonstrate the superiority of the proposed approach compared to several state-of-the-art methods.
引用
收藏
页码:2631 / 2641
页数:11
相关论文
共 50 条
  • [21] Accurate water quality prediction with attention-based bidirectional LSTM and encoder-decoder
    Bi, Jing
    Chen, Zexian
    Yuan, Haitao
    Zhang, Jia
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [22] Attention-based Bidirectional LSTM-CNN Model for Remaining Useful Life Estimation
    Song, Jou Won
    Park, Ye In
    Hong, Jong-Ju
    Kim, Seong-Gyun
    Kang, Suk-Ju
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [23] Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM
    Mahesh G. Huddar
    Sanjeev S. Sannakki
    Vijay S. Rajpurohit
    Multimedia Tools and Applications, 2021, 80 : 13059 - 13076
  • [24] Attention-based bidirectional LSTM with embedding technique for classification of COVID-19 articles
    Dutta, Rakesh
    Majumder, Mukta
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2022, 16 (01): : 205 - 215
  • [25] Attention-based bidirectional LSTM with embedding technique for classification of COVID-19 articles
    Dutta, Rakesh
    Majumder, Mukta
    Intelligent Decision Technologies, 2022, 16 (01) : 205 - 215
  • [26] Step Counting with Attention-based LSTM
    Khan, Shehroz S.
    Abedi, Ali
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 559 - 566
  • [27] Attention-based video streaming
    Dikici, Cagatay
    Bozma, H. Isil
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2010, 25 (10) : 745 - 760
  • [28] Bridging the Gap: Enhancing Storm Surge Prediction and Decision Support with Bidirectional Attention-Based LSTM
    Ian, Vai-Kei
    Tse, Rita
    Tang, Su-Kit
    Pau, Giovanni
    ATMOSPHERE, 2023, 14 (07)
  • [29] Hybrid attention-based temporal convolutional bidirectional LSTM approach for wind speed interval prediction
    Bommidi, Bala Saibabu
    Kosana, Vishalteja
    Teeparthi, Kiran
    Madasthu, Santhosh
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2023, 30 (14) : 40018 - 40030
  • [30] FACIAL EMOTION RECOGNITION USING LIGHT FIELD IMAGES WITH DEEP ATTENTION-BASED BIDIRECTIONAL LSTM
    Sepas-Moghaddam, Alireza
    Etemad, Ali
    Pereira, Fernando
    Correia, Paulo Lobato
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3367 - 3371