Hierarchical Context-aware Network for Dense Video Event Captioning

被引:0
|
作者
Ji, Lei [1 ,2 ,3 ]
Guo, Xianglin [4 ]
Huang, Haoyang [3 ]
Chen, Xilin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Microsoft Res Asia, Beijing, Peoples R China
[4] NYU, New York, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dense video event captioning aims to generate a sequence of descriptive captions for each event in a long untrimmed video. Video-level context provides important information and facilities the model to generate consistent and less redundant captions between events. In this paper, we introduce a novel Hierarchical Context-aware Network for dense video event captioning (HCN) to capture context from various aspects. In detail, the model leverages local and global context with different mechanisms to jointly learn to generate coherent captions. The local context module performs full interaction between neighbor frames and the global context module selectively attends to previous or future events. According to our extensive experiment on both Youcook2 and Activitynet Captioning datasets, the videolevel HCN model outperforms the event-level context-agnostic model by a large margin. The code is available at https://github.com/ KirkGuo/HCN.
引用
收藏
页码:2004 / 2013
页数:10
相关论文
共 50 条
  • [1] Stacked Multimodal Attention Network for Context-Aware Video Captioning
    Zheng, Yi
    Zhang, Yuejie
    Feng, Rui
    Zhang, Tao
    Fan, Weiguo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 31 - 42
  • [2] Memory-attended semantic context-aware network for video captioning
    Chen, Shuqin
    Zhong, Xian
    Wu, Shifeng
    Sun, Zhixin
    Liu, Wenxuan
    Jia, Xuemei
    Xia, Hongxia
    SOFT COMPUTING, 2021, 28 (Suppl 2) : 425 - 425
  • [3] Memory-attended semantic context-aware network for video captioning
    Chen, Shuqin
    Zhong, Xian
    Wu, Shifeng
    Sun, Zhixin
    Liu, Wenxuan
    Jia, Xuemei
    Xia, Hongxia
    Soft Computing, 2021,
  • [4] Textual Context-Aware Dense Captioning With Diverse Words
    Shao, Zhuang
    Han, Jungong
    Debattista, Kurt
    Pang, Yanwei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8753 - 8766
  • [5] Event-Centric Hierarchical Representation for Dense Video Captioning
    Wang, Teng
    Zheng, Huicheng
    Yu, Mingjing
    Tian, Qian
    Hu, Haifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1890 - 1900
  • [6] Dense Material Segmentation with Context-Aware Network
    Heng, Yuwen
    Wu, Yihong
    Dasmahapatra, Srinandan
    Kim, Hansung
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2022, 2023, 1815 : 66 - 88
  • [7] A latent topic-aware network for dense video captioning
    Xu, Tao
    Cui, Yuanyuan
    He, Xinyu
    Liu, Caihua
    IET COMPUTER VISION, 2023, 17 (07) : 795 - 803
  • [8] Context-aware transformer for image captioning
    Yang, Xin
    Wang, Ying
    Chen, Haishun
    Li, Jie
    Huang, Tingting
    NEUROCOMPUTING, 2023, 549
  • [9] Dual dense context-aware network for hippocampal segmentation
    Shi, Jiali
    Zhang, Rong
    Guo, Lijun
    Gao, Linlin
    Li, Yuqi
    Ma, Huifang
    Wang, Jianhua
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 61
  • [10] Hierarchical Attention Network for Context-Aware Query Suggestion
    Li, Xiangsheng
    Liu, Yiqun
    Li, Xin
    Luo, Cheng
    Nie, Jian-Yun
    Zhang, Min
    Ma, Shaoping
    INFORMATION RETRIEVAL TECHNOLOGY (AIRS 2018), 2018, 11292 : 173 - 186