Automatic Video Captioning via Multi-channel Sequential Encoding

被引：2

作者：

Zhang, Chenyang ^{[1
]}

Tian, Yingli ^{[1
]}

机构：

[1] CUNY City Coll, Dept Elect Engn, New York, NY 10031 USA

来源：

COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II | 2016年 / 9914卷

关键词：

Video captioning; Long-short-term-memory; Sequential encoding; American Sign Language;

D O I：

10.1007/978-3-319-48881-3_11

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a novel two-stage video captioning framework composed of (1) a multi-channel video encoder and (2) a sentence-generating language decoder. Both of the encoder and decoder are based on recurrent neural networks with long-short-term-memory cells. Our system can take videos of arbitrary lengths as input. Compared with the previous sequence-to-sequence video captioning frameworks, the proposed model is able to handle multiple channels of video representations and jointly learn how to combine them. The proposed model is evaluated on two large-scale movie datasets (MPII Corpus and Montreal Video Description) and one YouTube dataset (Microsoft Video Description Corpus) and achieves the state-of-the-art performances. Furthermore, we extend the proposed model towards automatic American Sign Language recognition. To evaluate the performance of our model on this novel application, a new dataset for ASL video description is collected based on YouTube videos. Results on this dataset indicate that the proposed framework on ASL recognition is promising and will significantly benefit the independent communication between ASL users and others.

引用

页码：146 / 161

页数：16

共 50 条

[41] Multi-view video and multi-channel audio broadcasting system
Oh, Kwan-Jung
Kim, Manbae
Yoon, Jae Sam
Kim, Jongryool
Park, Ilkwon
Lee, Seungwon
Lee, Cheon
Heo, Jin
Lee, Sang-Beom
Park, Pil-Kyu
Na, Sang-Tae
Hyun, Myung-Han
Kim, JongWon
Byun, Hyeran
Kim, Hong Kook
Ho, Yo-Sung
2007 3DTV CONFERENCE, 2007, : 165 - +
[42] Multi-Channel GMTI via Approximated Observation
Ender, Joachim
2019 20TH INTERNATIONAL RADAR SYMPOSIUM (IRS), 2019,
[43] Sequential Learning for Multi-Channel Wireless Network Monitoring With Channel Switching Costs
Thanh Le
Szepesvari, Csaba
Zheng, Rong
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (22) : 5919 - 5929
[44] Multi-Channel Hypergraph Network for Sequential Diagnosis Prediction in Healthcare
Zhang, Xin
Peng, Xueping
Chen, Weimin
Zhang, Weiyu
Ren, Xiaoqiang
Lu, Wenpeng
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 2937 - 2942
[45] Multi-channel Orthogonal Decomposition Attention Network for Sequential Recommendation
Guo, Jia
Ji, Wendi
Yuan, Jiahao
Wang, Xiaoling
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT III, 2022, 13282 : 288 - 300
[46] Sequential Learning for Optimal Monitoring of Multi-channel Wireless Networks
Arora, Pallavi
Szepesvari, Csaba
Zheng, Rong
2011 PROCEEDINGS IEEE INFOCOM, 2011, : 1152 - 1160
[47] Video combiner for multi-channel video surveillance based on finite state methods
Abdel-Maguid, M
Moniri, M
AVSS 2005: ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, PROCEEDINGS, 2005, : 599 - 603
[48] Structured Encoding Based on Semantic Disambiguation for Video Captioning
Sun, Bo
Tian, Jinyu
Wu, Yong
Yu, Lunjun
Tang, Yuanyan
COGNITIVE COMPUTATION, 2024, 16 (03) : 1032 - 1048
[49] Automatic pickup of arrival time of channel wave based on multi-channel constraints
Wang Bao-Li
APPLIED GEOPHYSICS, 2018, 15 (01) : 118 - 124
[50] Automatic pickup of arrival time of channel wave based on multi-channel constraints
Bao-Li Wang
Applied Geophysics, 2018, 15 : 118 - 124

← 1 2 3 4 5 →