Rethinking Network for Classroom Video Captioning

被引:0
|
作者
Zhu, Mingjian [1 ,2 ,3 ]
Duan, Chenrui [1 ,2 ,3 ]
Yu, Changbin [2 ,3 ]
机构
[1] Zhejiang Univ, Qingzhiwu Rd, Hangzhou, Peoples R China
[2] Westlake Univ, Sch Engn, 18 Shilongshan Rd, Hangzhou, Peoples R China
[3] Westlake Inst Adv Study, Inst Adv Technol, 18 Shilongshan Rd, Hangzhou, Peoples R China
关键词
Classroom Video Captioning; Video Processing; Natural Language Processing;
D O I
10.1117/12.2589435
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many people believe that the understanding of classroom activities can benefit the parents and education experts to analyze the teaching situation. However, employing workers to supervise the events in the classroom costs lots of human resources. The deployment of surveillance video systems is considered to be a good solution to this problem. Converting videos captured by cameras into descriptions can further reduce data transmission and storage costs. In this paper, we propose a new task named Classroom Video Captioning (CVC), which aims at describing the events in classroom videos with natural language. We collect classroom videos and annotate them with sentences. To tackle the task, we employ an effective architecture called rethinking network to encode the visual features and generate the descriptions. The extensive experiments on our dataset demonstrate that our method can describe the events in classroom videos satisfactorily.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] CMGNet: Collaborative multi-modal graph network for video captioning
    Rao, Qi
    Yu, Xin
    Li, Guang
    Zhu, Linchao
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238
  • [42] Hierarchical Context-aware Network for Dense Video Event Captioning
    Ji, Lei
    Guo, Xianglin
    Huang, Haoyang
    Chen, Xilin
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2004 - 2013
  • [43] GPT-Based Knowledge Guiding Network for Commonsense Video Captioning
    Yuan, Mengqi
    Jia, Gengyun
    Bao, Bing-Kun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5147 - 5158
  • [44] Global-Local Combined Semantic Generation Network for Video Captioning
    Mao L.
    Gao H.
    Yang D.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (09): : 1374 - 1382
  • [45] Context Visual Information-based Deliberation Network for Video Captioning
    Lu, Min
    Li, Xueyong
    Liu, Caihua
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9812 - 9818
  • [46] Multimodal Deep Neural Network with Image Sequence Features for Video Captioning
    Oura, Soichiro
    Matsukawa, Tetsu
    Suzuki, Einoshin
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [47] Video Captioning based on Image Captioning as Subsidiary Content
    Vaishnavi, J.
    Narmatha, V
    2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
  • [48] RETHINKING THE CLASSROOM
    Kosowatz, John
    MECHANICAL ENGINEERING, 2018, 140 (03) : 43 - 45
  • [49] Rethink video retrieval representation for video captioning
    Tian, Mingkai
    Li, Guorong
    Qi, Yuankai
    Wang, Shuhui
    Sheng, Quan Z.
    Huang, Qingming
    PATTERN RECOGNITION, 2024, 156
  • [50] A Review Of Video Captioning Methods
    Mahajan, Dewarthi
    Bhosale, Sakshi
    Nighot, Yash
    Tayal, Madhuri
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (05): : 708 - 715