Rethinking Network for Classroom Video Captioning

被引：0

作者：

Zhu, Mingjian ^{[1
,2
,3
]}

Duan, Chenrui ^{[1
,2
,3
]}

Yu, Changbin ^{[2
,3
]}

机构：

[1] Zhejiang Univ, Qingzhiwu Rd, Hangzhou, Peoples R China

[2] Westlake Univ, Sch Engn, 18 Shilongshan Rd, Hangzhou, Peoples R China

[3] Westlake Inst Adv Study, Inst Adv Technol, 18 Shilongshan Rd, Hangzhou, Peoples R China

来源：

TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS | 2021年 / 11719卷

关键词：

Classroom Video Captioning; Video Processing; Natural Language Processing;

D O I：

10.1117/12.2589435

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many people believe that the understanding of classroom activities can benefit the parents and education experts to analyze the teaching situation. However, employing workers to supervise the events in the classroom costs lots of human resources. The deployment of surveillance video systems is considered to be a good solution to this problem. Converting videos captured by cameras into descriptions can further reduce data transmission and storage costs. In this paper, we propose a new task named Classroom Video Captioning (CVC), which aims at describing the events in classroom videos with natural language. We collect classroom videos and annotate them with sentences. To tackle the task, we employ an effective architecture called rethinking network to encode the visual features and generate the descriptions. The extensive experiments on our dataset demonstrate that our method can describe the events in classroom videos satisfactorily.

引用

页数：8

共 50 条

[41] CMGNet: Collaborative multi-modal graph network for video captioning
Rao, Qi
Yu, Xin
Li, Guang
Zhu, Linchao
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238
[42] Hierarchical Context-aware Network for Dense Video Event Captioning
Ji, Lei
Guo, Xianglin
Huang, Haoyang
Chen, Xilin
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2004 - 2013
[43] GPT-Based Knowledge Guiding Network for Commonsense Video Captioning
Yuan, Mengqi
Jia, Gengyun
Bao, Bing-Kun
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5147 - 5158
[44] Global-Local Combined Semantic Generation Network for Video Captioning
Mao L.
Gao H.
Yang D.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (09): : 1374 - 1382
[45] Context Visual Information-based Deliberation Network for Video Captioning
Lu, Min
Li, Xueyong
Liu, Caihua
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9812 - 9818
[46] Multimodal Deep Neural Network with Image Sequence Features for Video Captioning
Oura, Soichiro
Matsukawa, Tetsu
Suzuki, Einoshin
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[47] Video Captioning based on Image Captioning as Subsidiary Content
Vaishnavi, J.
Narmatha, V
2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
[48] RETHINKING THE CLASSROOM
Kosowatz, John
MECHANICAL ENGINEERING, 2018, 140 (03) : 43 - 45
[49] Rethink video retrieval representation for video captioning
Tian, Mingkai
Li, Guorong
Qi, Yuankai
Wang, Shuhui
Sheng, Quan Z.
Huang, Qingming
PATTERN RECOGNITION, 2024, 156
[50] A Review Of Video Captioning Methods
Mahajan, Dewarthi
Bhosale, Sakshi
Nighot, Yash
Tayal, Madhuri
INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (05): : 708 - 715

← 1 2 3 4 5 →