Rethinking Network for Classroom Video Captioning

被引:0
|
作者
Zhu, Mingjian [1 ,2 ,3 ]
Duan, Chenrui [1 ,2 ,3 ]
Yu, Changbin [2 ,3 ]
机构
[1] Zhejiang Univ, Qingzhiwu Rd, Hangzhou, Peoples R China
[2] Westlake Univ, Sch Engn, 18 Shilongshan Rd, Hangzhou, Peoples R China
[3] Westlake Inst Adv Study, Inst Adv Technol, 18 Shilongshan Rd, Hangzhou, Peoples R China
关键词
Classroom Video Captioning; Video Processing; Natural Language Processing;
D O I
10.1117/12.2589435
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many people believe that the understanding of classroom activities can benefit the parents and education experts to analyze the teaching situation. However, employing workers to supervise the events in the classroom costs lots of human resources. The deployment of surveillance video systems is considered to be a good solution to this problem. Converting videos captured by cameras into descriptions can further reduce data transmission and storage costs. In this paper, we propose a new task named Classroom Video Captioning (CVC), which aims at describing the events in classroom videos with natural language. We collect classroom videos and annotate them with sentences. To tackle the task, we employ an effective architecture called rethinking network to encode the visual features and generate the descriptions. The extensive experiments on our dataset demonstrate that our method can describe the events in classroom videos satisfactorily.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Multimodal-enhanced hierarchical attention network for video captioning
    Maosheng Zhong
    Youde Chen
    Hao Zhang
    Hao Xiong
    Zhixiang Wang
    Multimedia Systems, 2023, 29 : 2469 - 2482
  • [32] Video captioning using Semantically Contextual Generative Adversarial Network
    Munusamy, Hemalatha
    Sekhar, C. Chandra
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 221
  • [33] Syntax-Guided Hierarchical Attention Network for Video Captioning
    Deng, Jincan
    Li, Liang
    Zhang, Beichen
    Wang, Shuhui
    Zha, Zhengjun
    Huang, Qingming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 880 - 892
  • [34] Multimodal Interaction Fusion Network Based on Transformer for Video Captioning
    Xu, Hui
    Zeng, Pengpeng
    Khan, Abdullah Aman
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT I, 2022, 1700 : 21 - 36
  • [35] A latent topic-aware network for dense video captioning
    Xu, Tao
    Cui, Yuanyuan
    He, Xinyu
    Liu, Caihua
    IET COMPUTER VISION, 2023, 17 (07) : 795 - 803
  • [36] Video captioning – a survey
    Vaishnavi J.
    Narmatha V.
    Multimedia Tools and Applications, 2025, 84 (2) : 947 - 978
  • [37] Image/video captioning
    Ushiku Y.
    Ushiku, Yoshitaka, 2018, Inst. of Image Information and Television Engineers (72): : 650 - 654
  • [38] Emotional Video Captioning With Vision-Based Emotion Interpretation Network
    Song, Peipei
    Guo, Dan
    Yang, Xun
    Tang, Shengeng
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1122 - 1135
  • [39] Semantic Enhanced Encoder-Decoder Network (SEN) for Video Captioning
    Gui, Yuling
    Guo, Dan
    Zhao, Ye
    PROCEEDINGS OF THE 2ND WORKSHOP ON MULTIMEDIA FOR ACCESSIBLE HUMAN COMPUTER INTERFACES (MAHCI '19), 2019, : 25 - 32
  • [40] Stacked Multimodal Attention Network for Context-Aware Video Captioning
    Zheng, Yi
    Zhang, Yuejie
    Feng, Rui
    Zhang, Tao
    Fan, Weiguo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 31 - 42