Rethinking Network for Classroom Video Captioning

被引:0
|
作者
Zhu, Mingjian [1 ,2 ,3 ]
Duan, Chenrui [1 ,2 ,3 ]
Yu, Changbin [2 ,3 ]
机构
[1] Zhejiang Univ, Qingzhiwu Rd, Hangzhou, Peoples R China
[2] Westlake Univ, Sch Engn, 18 Shilongshan Rd, Hangzhou, Peoples R China
[3] Westlake Inst Adv Study, Inst Adv Technol, 18 Shilongshan Rd, Hangzhou, Peoples R China
关键词
Classroom Video Captioning; Video Processing; Natural Language Processing;
D O I
10.1117/12.2589435
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many people believe that the understanding of classroom activities can benefit the parents and education experts to analyze the teaching situation. However, employing workers to supervise the events in the classroom costs lots of human resources. The deployment of surveillance video systems is considered to be a good solution to this problem. Converting videos captured by cameras into descriptions can further reduce data transmission and storage costs. In this paper, we propose a new task named Classroom Video Captioning (CVC), which aims at describing the events in classroom videos with natural language. We collect classroom videos and annotate them with sentences. To tackle the task, we employ an effective architecture called rethinking network to encode the visual features and generate the descriptions. The extensive experiments on our dataset demonstrate that our method can describe the events in classroom videos satisfactorily.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] VIDEO CAPTIONING WITH TEMPORAL AND REGION GRAPH CONVOLUTION NETWORK
    Xiao, Xinlong
    Zhang, Yuejie
    Feng, Rui
    Zhang, Tao
    Gao, Shang
    Fan, Weiguo
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [22] Memory-Attended Recurrent Network for Video Captioning
    Pei, Wenjie
    Zhang, Jiyuan
    Wang, Xiangrong
    Ke, Lei
    Shen, Xiaoyong
    Tai, Yu-Wing
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8339 - 8348
  • [23] Hierarchical Representation Network With Auxiliary Tasks for Video Captioning and Video Question Answering
    Gao, Lianli
    Lei, Yu
    Zeng, Pengpeng
    Song, Jingkuan
    Wang, Meng
    Shen, Heng Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 202 - 215
  • [24] Emotion-Prior Awareness Network for Emotional Video Captioning
    Song, Peipei
    Guo, Dan
    Yang, Xun
    Tang, Shengeng
    Yang, Erkun
    Wang, Meng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 589 - 600
  • [25] Hybrid Reasoning Network for Video-based Commonsense Captioning
    Yu, Weijiang
    Liang, Jian
    Ji, Lei
    Li, Lu
    Fang, Yuejian
    Xiao, Nong
    Duan, Nan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5213 - 5221
  • [26] Visual Commonsense-Aware Representation Network for Video Captioning
    Zeng, Pengpeng
    Zhang, Haonan
    Gao, Lianli
    Li, Xiangpeng
    Qian, Jin
    Shen, Heng Tao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1092 - 1103
  • [27] Multi-feature fusion refine network for video captioning
    Wang, Guan-Hong
    Du, Ji-Xiang
    Zhang, Hong-Bo
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2022, 34 (03) : 483 - 497
  • [28] Multimodal-enhanced hierarchical attention network for video captioning
    Zhong, Maosheng
    Chen, Youde
    Zhang, Hao
    Xiong, Hao
    Wang, Zhixiang
    MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2469 - 2482
  • [29] A multi-layer memory sharing network for video captioning
    Niu, Tian-Zi
    Dong, Shan -Shan
    Chen, Zhen-Duo
    Luo, Xin
    Huang, Zi
    Guo, Shanqing
    Xu, Xin-Shun
    PATTERN RECOGNITION, 2023, 136
  • [30] Dual-Stream Recurrent Neural Network for Video Captioning
    Xu, Ning
    Liu, An-An
    Wong, Yongkang
    Zhang, Yongdong
    Nie, Weizhi
    Su, Yuting
    Kankanhalli, Mohan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (08) : 2482 - 2493