Automatic Video Captioning via Multi-channel Sequential Encoding

被引:2
|
作者
Zhang, Chenyang [1 ]
Tian, Yingli [1 ]
机构
[1] CUNY City Coll, Dept Elect Engn, New York, NY 10031 USA
关键词
Video captioning; Long-short-term-memory; Sequential encoding; American Sign Language;
D O I
10.1007/978-3-319-48881-3_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel two-stage video captioning framework composed of (1) a multi-channel video encoder and (2) a sentence-generating language decoder. Both of the encoder and decoder are based on recurrent neural networks with long-short-term-memory cells. Our system can take videos of arbitrary lengths as input. Compared with the previous sequence-to-sequence video captioning frameworks, the proposed model is able to handle multiple channels of video representations and jointly learn how to combine them. The proposed model is evaluated on two large-scale movie datasets (MPII Corpus and Montreal Video Description) and one YouTube dataset (Microsoft Video Description Corpus) and achieves the state-of-the-art performances. Furthermore, we extend the proposed model towards automatic American Sign Language recognition. To evaluate the performance of our model on this novel application, a new dataset for ASL video description is collected based on YouTube videos. Results on this dataset indicate that the proposed framework on ASL recognition is promising and will significantly benefit the independent communication between ASL users and others.
引用
收藏
页码:146 / 161
页数:16
相关论文
共 50 条
  • [1] A multi-channel, multi-encoding transmission scheme for wireless video streaming
    Kolekar, Abhijeet
    Feng, Wuchi
    Venkatachalam, Muthaiah
    MULTIMEDIA COMPUTING AND NETWORKING 2007, 2007, 6504
  • [2] Multi-channel weighted fusion for image captioning
    Zhong, Jingyue
    Cao, Yang
    Zhu, Yina
    Gong, Jie
    Chen, Qiaosen
    VISUAL COMPUTER, 2023, 39 (12): : 6115 - 6132
  • [3] Multi-channel weighted fusion for image captioning
    Jingyue Zhong
    Yang Cao
    Yina Zhu
    Jie Gong
    Qiaosen Chen
    The Visual Computer, 2023, 39 : 6115 - 6132
  • [4] Sequential Multi-fusion Network for Multi-channel Video CTR Prediction
    Wang, Wen
    Zhang, Wei
    Feng, Wei
    Zha, Hongyuan
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT III, 2020, 12114 : 3 - 18
  • [5] Multi-channel video segmentation
    Faudemay, P
    Chen, LM
    Montacie, C
    Caraty, MJ
    Maloigne, C
    Tu, XW
    Ardebilian, M
    LeFloch, JL
    MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS, 1996, 2916 : 252 - 264
  • [6] Arbitrary Video Style Transfer via Multi-Channel Correlation
    Deng, Yingying
    Tang, Fan
    Dong, Weiming
    Huang, Haibin
    Ma, Chongyang
    Xu, Changsheng
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1210 - 1217
  • [7] A MULTI-CHANNEL SEQUENTIAL DETECTION PROCEDURE
    NADELYAYEV, YV
    RADIO ENGINEERING AND ELECTRONIC PHYSICS-USSR, 1969, 14 (12): : 1842 - +
  • [8] Multi-Channel and Fusion Encoding Strategy Based Auto Encoder Model for Video Recommendation
    Yan, Wenjie
    Wang, Dong
    Liu, Jing
    Ma, Liang
    Li, Zhi
    IEEE ACCESS, 2019, 7 : 86004 - 86017
  • [9] Automatic Landslide and Mudflow Detection Method via Multi-channel Sparse Representation
    Chen, Chao
    Zhou, Jianjun
    Hao, Zhuo
    Sun, Bo
    He, Jun
    Ge, Fengxiang
    EARTH RESOURCES AND ENVIRONMENTAL REMOTE SENSING/GIS APPLICATIONS VI, 2015, 9644
  • [10] Sequential Good Channel Search for Multi-channel Cognitive Radio
    Caromi, Raied
    Mohan, Seshadri
    Lai, Lifeng
    2012 CONFERENCE RECORD OF THE FORTY SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2012, : 313 - 317