Rethinking Network for Classroom Video Captioning

被引：0

作者：

Zhu, Mingjian ^{[1
,2
,3
]}

Duan, Chenrui ^{[1
,2
,3
]}

Yu, Changbin ^{[2
,3
]}

机构：

[1] Zhejiang Univ, Qingzhiwu Rd, Hangzhou, Peoples R China

[2] Westlake Univ, Sch Engn, 18 Shilongshan Rd, Hangzhou, Peoples R China

[3] Westlake Inst Adv Study, Inst Adv Technol, 18 Shilongshan Rd, Hangzhou, Peoples R China

来源：

TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS | 2021年 / 11719卷

关键词：

Classroom Video Captioning; Video Processing; Natural Language Processing;

D O I：

10.1117/12.2589435

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many people believe that the understanding of classroom activities can benefit the parents and education experts to analyze the teaching situation. However, employing workers to supervise the events in the classroom costs lots of human resources. The deployment of surveillance video systems is considered to be a good solution to this problem. Converting videos captured by cameras into descriptions can further reduce data transmission and storage costs. In this paper, we propose a new task named Classroom Video Captioning (CVC), which aims at describing the events in classroom videos with natural language. We collect classroom videos and annotate them with sentences. To tackle the task, we employ an effective architecture called rethinking network to encode the visual features and generate the descriptions. The extensive experiments on our dataset demonstrate that our method can describe the events in classroom videos satisfactorily.

引用

页数：8

共 50 条

[21] VIDEO CAPTIONING WITH TEMPORAL AND REGION GRAPH CONVOLUTION NETWORK
Xiao, Xinlong
Zhang, Yuejie
Feng, Rui
Zhang, Tao
Gao, Shang
Fan, Weiguo
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[22] Memory-Attended Recurrent Network for Video Captioning
Pei, Wenjie
Zhang, Jiyuan
Wang, Xiangrong
Ke, Lei
Shen, Xiaoyong
Tai, Yu-Wing
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8339 - 8348
[23] Hierarchical Representation Network With Auxiliary Tasks for Video Captioning and Video Question Answering
Gao, Lianli
Lei, Yu
Zeng, Pengpeng
Song, Jingkuan
Wang, Meng
Shen, Heng Tao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 202 - 215
[24] Emotion-Prior Awareness Network for Emotional Video Captioning
Song, Peipei
Guo, Dan
Yang, Xun
Tang, Shengeng
Yang, Erkun
Wang, Meng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 589 - 600
[25] Hybrid Reasoning Network for Video-based Commonsense Captioning
Yu, Weijiang
Liang, Jian
Ji, Lei
Li, Lu
Fang, Yuejian
Xiao, Nong
Duan, Nan
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5213 - 5221
[26] Visual Commonsense-Aware Representation Network for Video Captioning
Zeng, Pengpeng
Zhang, Haonan
Gao, Lianli
Li, Xiangpeng
Qian, Jin
Shen, Heng Tao
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1092 - 1103
[27] Multi-feature fusion refine network for video captioning
Wang, Guan-Hong
Du, Ji-Xiang
Zhang, Hong-Bo
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2022, 34 (03) : 483 - 497
[28] Multimodal-enhanced hierarchical attention network for video captioning
Zhong, Maosheng
Chen, Youde
Zhang, Hao
Xiong, Hao
Wang, Zhixiang
MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2469 - 2482
[29] A multi-layer memory sharing network for video captioning
Niu, Tian-Zi
Dong, Shan -Shan
Chen, Zhen-Duo
Luo, Xin
Huang, Zi
Guo, Shanqing
Xu, Xin-Shun
PATTERN RECOGNITION, 2023, 136
[30] Dual-Stream Recurrent Neural Network for Video Captioning
Xu, Ning
Liu, An-An
Wong, Yongkang
Zhang, Yongdong
Nie, Weizhi
Su, Yuting
Kankanhalli, Mohan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (08) : 2482 - 2493

← 1 2 3 4 5 →