CAPTURING LONG-RANGE DEPENDENCIES IN VIDEO CAPTIONING

被引:0
|
作者
Lee, Jaeyoung [1 ,3 ]
Lee, Yekang [1 ]
Seong, Sihyeon [1 ,3 ]
Kim, Kyungsu [2 ]
Kim, Sungjin [2 ]
Kim, Junmo [1 ,3 ]
机构
[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[2] Samsung Res, Seoul, South Korea
[3] Mofl Inc, Seoul, South Korea
关键词
Video captioning; non-local block; long short-term memory; long-range dependency; video representation;
D O I
10.1109/icip.2019.8803143
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Most video captioning networks rely on recurrent models, including long short-term memory (LSTM). However, these recurrent models have a long-range dependency problem; thus, they are not sufficient for video encoding. To overcome this limitation, several studies investigated the relationships between objects or entities and have shown excellent performance in video classification and video captioning. In this study, we analyze a video captioning network with a non-local block in terms of temporal capacity. We introduce a video captioning method to capture long-range temporal dependencies with a non-local block. The proposed model independently uses local and non-local features. We evaluate our approach on a Microsoft Video Description Corpus (MSVD, YouTube2Text) dataset. The experimental results show that a non-local block applied along the temporal axis can solve the long-range dependency problem of the LSTM in video captioning datasets.
引用
收藏
页码:1880 / 1884
页数:5
相关论文
共 50 条
  • [1] Multi-semantic long-range dependencies capturing for efficient video representation learning
    Duan, Jinhao
    Xu, Hua
    Lin, Xiaozhu
    Zhu, Shangchao
    Du, Yuanze
    IMAGE AND VISION COMPUTING, 2020, 104
  • [2] Long-Range Feature Dependencies Capturing for Low-Resolution Image Classification
    Kang, Sheng
    Wang, Yang
    Cao, Yang
    Zha, Zheng-Jun
    MULTIMEDIA MODELING, MMM 2022, PT II, 2022, 13142 : 3 - 14
  • [3] Joint multi-scale information and long-range dependence for video captioning
    Zhongyi Zhai
    Xiaofeng Chen
    Yishuang Huang
    Lingzhong Zhao
    Bo Cheng
    Qian He
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [4] Joint multi-scale information and long-range dependence for video captioning
    Zhai, Zhongyi
    Chen, Xiaofeng
    Huang, Yishuang
    Zhao, Lingzhong
    Cheng, Bo
    He, Qian
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (02)
  • [5] Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulation
    Birnbaum, Sawyer
    Kuleshov, Volodymyr
    Enam, S. Zayd
    Koh, Pang Wei
    Ermon, Stefano
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Online learning of long-range dependencies
    Zucchet, Nicolas
    Meier, Robert
    Schug, Simon
    Mujika, Asier
    Sacramento, Joao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Long-Range Dependencies in Algorithmic Computing
    Strzalka, Dominik
    Grabowski, Franciszek
    2008 CONFERENCE ON HUMAN SYSTEM INTERACTIONS, VOLS 1 AND 2, 2008, : 570 - 575
  • [8] Capturing long-range correlations with patch models
    Cheung, Vincent
    Jojic, Nebojsa
    Samaras, Dimitris
    2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 974 - +
  • [9] Capturing short-range and long-range dependencies of nucleotides for identifying RNA N6-methyladenosine modification sites
    Li, Guodong
    Zhao, Bowei
    Su, Xiaorui
    Yang, Yue
    Zeng, Zhi
    Hu, Pengwei
    Hu, Lun
    Computers in Biology and Medicine, 186
  • [10] INVESTIGATION OF LONG-RANGE DEPENDENCIES IN DAILY GPS SOLUTIONS
    Klos, Anna
    Bogusz, Janusz
    Figurski, Mariusz
    Kujawa, Marcin
    INTERNATIONAL WORK-CONFERENCE ON TIME SERIES (ITISE 2014), 2014, : 434 - 434