Automatic Video Captioning via Multi-channel Sequential Encoding

被引:2
|
作者
Zhang, Chenyang [1 ]
Tian, Yingli [1 ]
机构
[1] CUNY City Coll, Dept Elect Engn, New York, NY 10031 USA
关键词
Video captioning; Long-short-term-memory; Sequential encoding; American Sign Language;
D O I
10.1007/978-3-319-48881-3_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel two-stage video captioning framework composed of (1) a multi-channel video encoder and (2) a sentence-generating language decoder. Both of the encoder and decoder are based on recurrent neural networks with long-short-term-memory cells. Our system can take videos of arbitrary lengths as input. Compared with the previous sequence-to-sequence video captioning frameworks, the proposed model is able to handle multiple channels of video representations and jointly learn how to combine them. The proposed model is evaluated on two large-scale movie datasets (MPII Corpus and Montreal Video Description) and one YouTube dataset (Microsoft Video Description Corpus) and achieves the state-of-the-art performances. Furthermore, we extend the proposed model towards automatic American Sign Language recognition. To evaluate the performance of our model on this novel application, a new dataset for ASL video description is collected based on YouTube videos. Results on this dataset indicate that the proposed framework on ASL recognition is promising and will significantly benefit the independent communication between ASL users and others.
引用
收藏
页码:146 / 161
页数:16
相关论文
共 50 条
  • [41] Multi-view video and multi-channel audio broadcasting system
    Oh, Kwan-Jung
    Kim, Manbae
    Yoon, Jae Sam
    Kim, Jongryool
    Park, Ilkwon
    Lee, Seungwon
    Lee, Cheon
    Heo, Jin
    Lee, Sang-Beom
    Park, Pil-Kyu
    Na, Sang-Tae
    Hyun, Myung-Han
    Kim, JongWon
    Byun, Hyeran
    Kim, Hong Kook
    Ho, Yo-Sung
    2007 3DTV CONFERENCE, 2007, : 165 - +
  • [42] Multi-Channel GMTI via Approximated Observation
    Ender, Joachim
    2019 20TH INTERNATIONAL RADAR SYMPOSIUM (IRS), 2019,
  • [43] Sequential Learning for Multi-Channel Wireless Network Monitoring With Channel Switching Costs
    Thanh Le
    Szepesvari, Csaba
    Zheng, Rong
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (22) : 5919 - 5929
  • [44] Multi-Channel Hypergraph Network for Sequential Diagnosis Prediction in Healthcare
    Zhang, Xin
    Peng, Xueping
    Chen, Weimin
    Zhang, Weiyu
    Ren, Xiaoqiang
    Lu, Wenpeng
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 2937 - 2942
  • [45] Multi-channel Orthogonal Decomposition Attention Network for Sequential Recommendation
    Guo, Jia
    Ji, Wendi
    Yuan, Jiahao
    Wang, Xiaoling
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT III, 2022, 13282 : 288 - 300
  • [46] Sequential Learning for Optimal Monitoring of Multi-channel Wireless Networks
    Arora, Pallavi
    Szepesvari, Csaba
    Zheng, Rong
    2011 PROCEEDINGS IEEE INFOCOM, 2011, : 1152 - 1160
  • [47] Video combiner for multi-channel video surveillance based on finite state methods
    Abdel-Maguid, M
    Moniri, M
    AVSS 2005: ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, PROCEEDINGS, 2005, : 599 - 603
  • [48] Structured Encoding Based on Semantic Disambiguation for Video Captioning
    Sun, Bo
    Tian, Jinyu
    Wu, Yong
    Yu, Lunjun
    Tang, Yuanyan
    COGNITIVE COMPUTATION, 2024, 16 (03) : 1032 - 1048
  • [49] Automatic pickup of arrival time of channel wave based on multi-channel constraints
    Wang Bao-Li
    APPLIED GEOPHYSICS, 2018, 15 (01) : 118 - 124
  • [50] Automatic pickup of arrival time of channel wave based on multi-channel constraints
    Bao-Li Wang
    Applied Geophysics, 2018, 15 : 118 - 124