Describing Video With Attention-Based Bidirectional LSTM

被引:181
|
作者
Bin, Yi [1 ,2 ]
Yang, Yang [1 ,2 ]
Shen, Fumin [1 ,2 ]
Xie, Ning [1 ,2 ]
Shen, Heng Tao [1 ,2 ]
Li, Xuelong [3 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu 611731, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China
[3] Chinese Acad Sci, Ctr Opt Imagery Anal & Learning, Xian Inst Opt & Precis Mech, State Key Lab Transient Opt & Photon, Xian 710119, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Bidirectional long-short term memory (BiLSTM); temporal attention; video captioning;
D O I
10.1109/TCYB.2018.2831447
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning has been attracting broad research attention in the multimedia community. However, most existing approaches heavily rely on static visual information or partially capture the local temporal knowledge (e.g., within 16 frames), thus hardly describing motions accurately from a global view. In this paper, we propose a novel video captioning framework, which integrates bidirectional long-short term memory (BiLSTM) and a soft attention mechanism to generate better global representations for videos as well as enhance the recognition of lasting motions in videos. To generate video captions, we exploit another long-short term memory as a decoder to fully explore global contextual information. The benefits of our proposed method are two fold: 1) the BiLSTM structure comprehensively preserves global temporal and visual information and 2) the soft attention mechanism enables a language decoder to recognize and focus on principle targets from the complex content. We verify the effectiveness of our proposed video captioning framework on two widely used benchmarks, that is, microsoft video description corpus and MSR-video to text, and the experimental results demonstrate the superiority of the proposed approach compared to several state-of-the-art methods.
引用
收藏
页码:2631 / 2641
页数:11
相关论文
共 50 条
  • [1] Attention-Based Convolutional LSTM for Describing Video
    Liu, Zhongyu
    Chen, Tian
    Ding, Enjie
    Liu, Yafeng
    Yu, Wanli
    IEEE Access, 2020, 8 : 133713 - 133724
  • [2] Attention-Based Convolutional LSTM for Describing Video
    Liu, Zhongyu
    Chen, Tian
    Ding, Enjie
    Liu, Yafeng
    Yu, Wanli
    IEEE ACCESS, 2020, 8 : 133713 - 133724
  • [3] Attention-based bidirectional LSTM for Chinese punctuation prediction
    Li, Jinliang
    Yin, Chengfeng
    Jia, Zhen
    Li, Tianrui
    Tang, Min
    DATA SCIENCE AND KNOWLEDGE ENGINEERING FOR SENSING DECISION SUPPORT, 2018, 11 : 485 - 491
  • [4] Attention-based bidirectional LSTM for Chinese punctuation prediction
    Li, Jinliang
    Yin, Chengfeng
    Jia, Zhen
    Li, Tianrui
    Tang, Min
    DATA SCIENCE AND KNOWLEDGE ENGINEERING FOR SENSING DECISION SUPPORT, 2018, 11 : 708 - 714
  • [5] Residual attention-based LSTM for video captioning
    Xiangpeng Li
    Zhilong Zhou
    Lijiang Chen
    Lianli Gao
    World Wide Web, 2019, 22 : 621 - 636
  • [6] Residual attention-based LSTM for video captioning
    Li, Xiangpeng
    Zhou, Zhilong
    Chen, Lijiang
    Gao, Lianli
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 621 - 636
  • [7] Att-BiL-SL: Attention-Based Bi-LSTM and Sequential LSTM for Describing Video in the Textual Formation
    Ahmed, Shakil
    Saif, A. F. M. Saifuddin
    Hanif, Md Imtiaz
    Shakil, Md Mostofa Nurannabi
    Jaman, Md Mostofa
    Haque, Md Mazid Ul
    Shawkat, Siam Bin
    Hasan, Jahid
    Sonok, Borshan Sarker
    Rahman, Farzad
    Sabbir, Hasan Muhommod
    APPLIED SCIENCES-BASEL, 2022, 12 (01):
  • [8] Video Captioning With Attention-Based LSTM and Semantic Consistency
    Gao, Lianli
    Guo, Zhao
    Zhang, Hanwang
    Xu, Xing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (09) : 2045 - 2055
  • [9] Attention-based Densely Connected LSTM for Video Captioning
    Zhu, Yongqing
    Jiang, Shuqiang
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 802 - 810
  • [10] AB-LSTM: Attention-based Bidirectional LSTM Model for Scene Text Detection
    Liu, Zhandong
    Zhou, Wengang
    Li, Houqiang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (04)