Rethinking Network for Classroom Video Captioning

被引:0
|
作者
Zhu, Mingjian [1 ,2 ,3 ]
Duan, Chenrui [1 ,2 ,3 ]
Yu, Changbin [2 ,3 ]
机构
[1] Zhejiang Univ, Qingzhiwu Rd, Hangzhou, Peoples R China
[2] Westlake Univ, Sch Engn, 18 Shilongshan Rd, Hangzhou, Peoples R China
[3] Westlake Inst Adv Study, Inst Adv Technol, 18 Shilongshan Rd, Hangzhou, Peoples R China
关键词
Classroom Video Captioning; Video Processing; Natural Language Processing;
D O I
10.1117/12.2589435
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many people believe that the understanding of classroom activities can benefit the parents and education experts to analyze the teaching situation. However, employing workers to supervise the events in the classroom costs lots of human resources. The deployment of surveillance video systems is considered to be a good solution to this problem. Converting videos captured by cameras into descriptions can further reduce data transmission and storage costs. In this paper, we propose a new task named Classroom Video Captioning (CVC), which aims at describing the events in classroom videos with natural language. We collect classroom videos and annotate them with sentences. To tackle the task, we employ an effective architecture called rethinking network to encode the visual features and generate the descriptions. The extensive experiments on our dataset demonstrate that our method can describe the events in classroom videos satisfactorily.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Reconstruction Network for Video Captioning
    Wang, Bairui
    Ma, Lin
    Zhang, Wei
    Liu, Wei
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7622 - 7631
  • [2] Hierarchical Modular Network for Video Captioning
    Ye, Hanhua
    Li, Guorong
    Qi, Yuankai
    Wang, Shuhui
    Huang, Qingming
    Yang, Ming-Hsuan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17918 - 17927
  • [3] Semantic Grouping Network for Video Captioning
    Ryu, Hobin
    Kang, Sunghun
    Kang, Haeyong
    Yoo, Chang D.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2514 - 2522
  • [4] Semantic guidance network for video captioning
    Guo, Lan
    Zhao, Hong
    Chen, Zhiwen
    Han, Zeyu
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [5] Guidance Module Network for Video Captioning
    Zhang, Xiao
    Liu, Chunsheng
    Chang, Faliang
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 7955 - 7959
  • [6] Semantic guidance network for video captioning
    Lan Guo
    Hong Zhao
    ZhiWen Chen
    ZeYu Han
    Scientific Reports, 13
  • [7] Global semantic enhancement network for video captioning
    Luo, Xuemei
    Luo, Xiaotong
    Wang, Di
    Liu, Jinhui
    Wan, Bo
    Zhao, Lin
    PATTERN RECOGNITION, 2024, 145
  • [8] Chained semantic generation network for video captioning
    Mao L.
    Gao H.
    Yang D.
    Zhang R.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2022, 30 (24): : 3198 - 3209
  • [9] MULTIMODAL SEMANTIC ATTENTION NETWORK FOR VIDEO CAPTIONING
    Sun, Liang
    Li, Bing
    Yuan, Chunfeng
    Zha, Zhengjun
    Hu, Weiming
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1300 - 1305
  • [10] Deep Reinforcement Polishing Network for Video Captioning
    Xu, Wanru
    Yu, Jian
    Miao, Zhenjiang
    Wan, Lili
    Tian, Yi
    Ji, Qiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 (23) : 1772 - 1784