VIDEOWHISPER: Toward Discriminative Unsupervised Video Feature Learning With Attention-Based Recurrent Neural Networks

被引:20
|
作者
Zhao, Na [1 ]
Zhang, Hanwang [1 ]
Hong, Richang [2 ]
Wang, Meng [2 ]
Chua, Tat-Seng [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 117417, Singapore
[2] Hefei Univ Technol, Sch Comp & Informat, Hefei 230009, Anhui, Peoples R China
基金
新加坡国家研究基金会;
关键词
Recurrent neural networks; sequence learning; unsupervised feature learning; video features; RECOGNITION;
D O I
10.1109/TMM.2017.2722687
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present VIDEOWHISPER, a novel approach for unsupervised video representation learning. Based on the observation that the frame sequence encodes the temporal dynamics of a video (e.g., object movement and event evolution), we treat the frame sequential order as a self-supervision to learn video representations. Unlike other unsupervised video feature learning methods based on frame-level feature reconstruction that is sensitive to visual variance, VIDEOWHISPER is driven by a novel video "sequence-to-whisper" learning strategy. Specifically, for each video sequence, we use a prelearned visual dictionary to generate a sequence of high-level semantics, dubbed "whisper," which can be considered as the language describing the video dynamics. In this way, we model VIDEOWHISPER as an end-to-end sequence-to-sequence learning model using attention-based recurrent neural networks. This model is trained to predict the whisper sequence and hence it is able to learn the temporal structure of videos. We propose two ways to generate video representation from the model. Through extensive experiments on two real-world video datasets, we demonstrate that video representation learned by VIDEOWHISPER is effective to boost fundamental multimedia applications such as video retrieval and event classification.
引用
收藏
页码:2080 / 2092
页数:13
相关论文
共 50 条
  • [1] Discriminative Unsupervised Feature Learning with Convolutional Neural Networks
    Dosovitskiy, Alexey
    Springenberg, Jost Tobias
    Riedmiller, Martin
    Brox, Thomas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [2] Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks
    Dosovitskiy, Alexey
    Fischer, Philipp
    Springenberg, Jost Tobias
    Riedmiller, Martin
    Brox, Thomas
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (09) : 1734 - 1747
  • [3] Discriminative Feature Learning for Unsupervised Video Summarization
    Jung, Yunjae
    Cho, Donghyeon
    Kim, Dahun
    Woo, Sanghyun
    Kweon, In So
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8537 - 8544
  • [4] Text Classification Research with Attention-based Recurrent Neural Networks
    Du, C.
    Huang, L.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2018, 13 (01) : 50 - 61
  • [5] Attention-Based Bidirectional Recurrent Neural Networks for Description Generation of Videos
    Du, Xiaotong
    Yuan, Jiabin
    Liu, Hu
    CLOUD COMPUTING AND SECURITY, PT VI, 2018, 11068 : 440 - 451
  • [6] Attention-Based Neural Networks for Chroma Intra Prediction in Video Coding
    Blanch, Marc Gorriz
    Blasi, Saverio
    Smeaton, Alan F.
    O'Connor, Noel E.
    Mrak, Marta
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (02) : 366 - 377
  • [7] An Attention-Based Convolutional Recurrent Neural Networks for Scene Text Recognition
    Alshawi, Adil Abdullah Abdulhussein
    Tanha, Jafar
    Balafar, Mohammad Ali
    IEEE ACCESS, 2024, 12 : 8123 - 8134
  • [8] Attention-Based Radar PRI Modulation Recognition With Recurrent Neural Networks
    Li, Xueqiong
    Liu, Zhangmeng
    Huang, Zhitao
    IEEE ACCESS, 2020, 8 : 57426 - 57436
  • [9] Attention-Based Phonetic Convolutional Recurrent Neural Networks for Language Identification
    Gundluru, Ramesh
    Venkatesh, Vayyavuru
    Murty, K. Sri Rama
    2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2021, : 475 - 480
  • [10] Text Language Identification Using Attention-Based Recurrent Neural Networks
    Perelkiewicz, Michal
    Poswiata, Rafal
    ARTIFICIAL INTELLIGENCEAND SOFT COMPUTING, PT I, 2019, 11508 : 181 - 190