SBAT: Video Captioning with Sparse Boundary-Aware Transformer

被引:0
|
作者
Jin, Tao [1 ]
Huang, Siyu [2 ]
Chen, Ming [3 ]
Li, Yingming [1 ]
Zhang, Zhongfei [4 ]
机构
[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou, Peoples R China
[2] Baidu Res, Shanghai, Peoples R China
[3] Alibaba Grp, Hangzhou, Peoples R China
[4] Binghamton Univ, Dept Comp Sci, Binghamton, NY USA
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we focus on the problem of applying the transformer structure to video captioning effectively. The vanilla transformer is proposed for uni-modal language generation task such as machine translation. However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps. Based on these concerns, we propose a novel method called sparse boundary-aware transformer (SBAT) to reduce the redundancy in video representation. SBAT employs boundary-aware pooling operation for scores from multihead attention and selects diverse features from different scenarios. Also, SBAT includes a local correlation scheme to compensate for the local information loss brought by sparse operation. Based on SBAT, we further propose an aligned cross-modal encoding scheme to boost the multimodal interaction. Experimental results on two benchmark datasets show that SBAT outperforms the state-of-the-art methods under most of the metrics.
引用
收藏
页码:630 / 636
页数:7
相关论文
共 50 条
  • [31] Making Procedural Water Waves Boundary-aware
    Jeschke, S.
    Hafner, C.
    Chentanez, N.
    Macklin, M.
    Mueller-Fischer, M.
    Wojtan, C.
    COMPUTER GRAPHICS FORUM, 2020, 39 (08) : 47 - 54
  • [32] Depth-Aware Sparse Transformer for Video-Language Learning
    Zhang, Haonan
    Gao, Lianli
    Zeng, Pengpeng
    Hanjalic, Alan
    Shen, Heng Tao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4778 - 4787
  • [33] Context-aware transformer for image captioning
    Yang, Xin
    Wang, Ying
    Chen, Haishun
    Li, Jie
    Huang, Tingting
    NEUROCOMPUTING, 2023, 549
  • [34] A Position-Aware Transformer for Image Captioning
    Deng, Zelin
    Zhou, Bo
    He, Pei
    Huang, Jianfeng
    Alfarraj, Osama
    Tolba, Amr
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 2065 - 2081
  • [35] A position-aware transformer for image captioning
    Deng, Zelin
    Zhou, Bo
    He, Pei
    Huang, Jianfeng
    Alfarraj, Osama
    Tolba, Amr
    Deng, Zelin (zl_deng@sina.com), 2005, Tech Science Press (70): : 2005 - 2021
  • [36] Computer vision to recognize construction waste compositions: A novel boundary-aware transformer (BAT) model
    Dong, Zhiming
    Chen, Junjie
    Lu, Weisheng
    JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2022, 305
  • [37] Boundary-Aware Network for Kidney Tumor Segmentation
    Hu, Shishuai
    Zhang, Jianpeng
    Xia, Yong
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2020, 2020, 12436 : 189 - 198
  • [38] Boundary-aware Graph Convolution for Semantic Segmentation
    Hu, Hanzhe
    Cui, Jinshi
    Zha, Hongbin
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1828 - 1835
  • [39] Deep boundary-aware semantic image segmentation
    Wu, Huisi
    Li, Yifan
    Chen, Le
    Liu, Xueting
    Li, Ping
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2021, 32 (3-4)
  • [40] Boundary-aware single fringe pattern demodulation
    Wang, Haixia
    Kemao, Qian
    OPTICS EXPRESS, 2017, 25 (26): : 32669 - 32685