Automatic Video Captioning via Multi-channel Sequential Encoding

被引:2
|
作者
Zhang, Chenyang [1 ]
Tian, Yingli [1 ]
机构
[1] CUNY City Coll, Dept Elect Engn, New York, NY 10031 USA
关键词
Video captioning; Long-short-term-memory; Sequential encoding; American Sign Language;
D O I
10.1007/978-3-319-48881-3_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel two-stage video captioning framework composed of (1) a multi-channel video encoder and (2) a sentence-generating language decoder. Both of the encoder and decoder are based on recurrent neural networks with long-short-term-memory cells. Our system can take videos of arbitrary lengths as input. Compared with the previous sequence-to-sequence video captioning frameworks, the proposed model is able to handle multiple channels of video representations and jointly learn how to combine them. The proposed model is evaluated on two large-scale movie datasets (MPII Corpus and Montreal Video Description) and one YouTube dataset (Microsoft Video Description Corpus) and achieves the state-of-the-art performances. Furthermore, we extend the proposed model towards automatic American Sign Language recognition. To evaluate the performance of our model on this novel application, a new dataset for ASL video description is collected based on YouTube videos. Results on this dataset indicate that the proposed framework on ASL recognition is promising and will significantly benefit the independent communication between ASL users and others.
引用
收藏
页码:146 / 161
页数:16
相关论文
共 50 条
  • [11] Towards more accurate object detection via encoding reinforcement and multi-channel enhancement
    Wang, Weina
    Li, Shuangyong
    Jumahong, Huxidan
    APPLIED INTELLIGENCE, 2025, 55 (02)
  • [12] A case for multi-channel memories in video recording
    Aho, Eero
    Nikara, Jari
    Tuominen, Petri A.
    Kuusilinna, Kimmo
    DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 934 - 939
  • [13] Optimization of an embedded multi-channel video encoder
    Wang Lifeng
    Meng Qinglei
    Lu Erhong
    Li Weimin
    Xiao Chen
    PROCEEDINGS OF THE FIRST INTERNATIONAL SYMPOSIUM ON TEST AUTOMATION & INSTRUMENTATION, VOLS 1 - 3, 2006, : 458 - 461
  • [14] Generic and automatic multi-channel control system
    Zhang, Xiaoyu
    Wu, Yuan
    Mo, Chongjiang
    ADVANCED DEVELOPMENT OF ENGINEERING SCIENCE IV, 2014, 1046 : 310 - 314
  • [15] Sequential Memory Modelling for Video Captioning
    Puttaraja
    Nayaka, Chidambara
    Manikesh
    Sharma, Nitin
    Anand, Kumar M.
    2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
  • [16] Multi-Channel Interactive Reinforcement Learning for Sequential Tasks
    Koert, Dorothea
    Kircher, Maximilian
    Salikutluk, Vildan
    D'Eramo, Carlo
    Peters, Jan
    FRONTIERS IN ROBOTICS AND AI, 2020, 7
  • [17] SEQUENTIAL COOPERATIVE SENSING FOR MULTI-CHANNEL COGNITIVE RADIOS
    Kim, Seung-Jun
    Giannakis, Georgios B.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2950 - 2953
  • [18] Sequential and Cooperative Sensing for Multi-Channel Cognitive Radios
    Kim, Seung-Jun
    Giannakis, Georgios B.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (08) : 4239 - 4253
  • [19] Partial Relationship Aware Influence Diffusion via a Multi-channel Encoding Scheme for Social Recommendation
    Jin, Bo
    Cheng, Ke
    Zhang, Liang
    Fu, Yanjie
    Yin, Minghao
    Jiang, Lu
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 585 - 594
  • [20] Temporal Attention Feature Encoding for Video Captioning
    Kim, Nayoung
    Ha, Seong Jong
    Kang, Je-Won
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 1279 - 1282