Rethinking Network for Classroom Video Captioning

被引：0

作者：

Zhu, Mingjian ^{[1
,2
,3
]}

Duan, Chenrui ^{[1
,2
,3
]}

Yu, Changbin ^{[2
,3
]}

机构：

[1] Zhejiang Univ, Qingzhiwu Rd, Hangzhou, Peoples R China

[2] Westlake Univ, Sch Engn, 18 Shilongshan Rd, Hangzhou, Peoples R China

[3] Westlake Inst Adv Study, Inst Adv Technol, 18 Shilongshan Rd, Hangzhou, Peoples R China

来源：

TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS | 2021年 / 11719卷

关键词：

Classroom Video Captioning; Video Processing; Natural Language Processing;

D O I：

10.1117/12.2589435

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many people believe that the understanding of classroom activities can benefit the parents and education experts to analyze the teaching situation. However, employing workers to supervise the events in the classroom costs lots of human resources. The deployment of surveillance video systems is considered to be a good solution to this problem. Converting videos captured by cameras into descriptions can further reduce data transmission and storage costs. In this paper, we propose a new task named Classroom Video Captioning (CVC), which aims at describing the events in classroom videos with natural language. We collect classroom videos and annotate them with sentences. To tackle the task, we employ an effective architecture called rethinking network to encode the visual features and generate the descriptions. The extensive experiments on our dataset demonstrate that our method can describe the events in classroom videos satisfactorily.

引用

页数：8

共 50 条

[31] Multimodal-enhanced hierarchical attention network for video captioning
Maosheng Zhong
Youde Chen
Hao Zhang
Hao Xiong
Zhixiang Wang
Multimedia Systems, 2023, 29 : 2469 - 2482
[32] Video captioning using Semantically Contextual Generative Adversarial Network
Munusamy, Hemalatha
Sekhar, C. Chandra
COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 221
[33] Syntax-Guided Hierarchical Attention Network for Video Captioning
Deng, Jincan
Li, Liang
Zhang, Beichen
Wang, Shuhui
Zha, Zhengjun
Huang, Qingming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 880 - 892
[34] Multimodal Interaction Fusion Network Based on Transformer for Video Captioning
Xu, Hui
Zeng, Pengpeng
Khan, Abdullah Aman
ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT I, 2022, 1700 : 21 - 36
[35] A latent topic-aware network for dense video captioning
Xu, Tao
Cui, Yuanyuan
He, Xinyu
Liu, Caihua
IET COMPUTER VISION, 2023, 17 (07) : 795 - 803
[36] Video captioning – a survey
Vaishnavi J.
Narmatha V.
Multimedia Tools and Applications, 2025, 84 (2) : 947 - 978
[37] Image/video captioning
Ushiku Y.
Ushiku, Yoshitaka, 2018, Inst. of Image Information and Television Engineers (72): : 650 - 654
[38] Emotional Video Captioning With Vision-Based Emotion Interpretation Network
Song, Peipei
Guo, Dan
Yang, Xun
Tang, Shengeng
Wang, Meng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1122 - 1135
[39] Semantic Enhanced Encoder-Decoder Network (SEN) for Video Captioning
Gui, Yuling
Guo, Dan
Zhao, Ye
PROCEEDINGS OF THE 2ND WORKSHOP ON MULTIMEDIA FOR ACCESSIBLE HUMAN COMPUTER INTERFACES (MAHCI '19), 2019, : 25 - 32
[40] Stacked Multimodal Attention Network for Context-Aware Video Captioning
Zheng, Yi
Zhang, Yuejie
Feng, Rui
Zhang, Tao
Fan, Weiguo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 31 - 42

← 1 2 3 4 5 →