SBAT: Video Captioning with Sparse Boundary-Aware Transformer

被引:0
|
作者
Jin, Tao [1 ]
Huang, Siyu [2 ]
Chen, Ming [3 ]
Li, Yingming [1 ]
Zhang, Zhongfei [4 ]
机构
[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou, Peoples R China
[2] Baidu Res, Shanghai, Peoples R China
[3] Alibaba Grp, Hangzhou, Peoples R China
[4] Binghamton Univ, Dept Comp Sci, Binghamton, NY USA
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we focus on the problem of applying the transformer structure to video captioning effectively. The vanilla transformer is proposed for uni-modal language generation task such as machine translation. However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps. Based on these concerns, we propose a novel method called sparse boundary-aware transformer (SBAT) to reduce the redundancy in video representation. SBAT employs boundary-aware pooling operation for scores from multihead attention and selects diverse features from different scenarios. Also, SBAT includes a local correlation scheme to compensate for the local information loss brought by sparse operation. Based on SBAT, we further propose an aligned cross-modal encoding scheme to boost the multimodal interaction. Experimental results on two benchmark datasets show that SBAT outperforms the state-of-the-art methods under most of the metrics.
引用
收藏
页码:630 / 636
页数:7
相关论文
共 50 条
  • [41] Boundary-Aware Spatial and Frequency Dual-Domain Transformer for Remote Sensing Urban Images Segmentation
    Zhang, Jie
    Shao, Mingwen
    Wan, Yecong
    Meng, Lingzhuang
    Cao, Xiangyong
    Wang, Shuigen
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 1
  • [42] BASNet: Boundary-Aware Salient Object Detection
    Qin, Xuebin
    Zhang, Zichen
    Huang, Chenyang
    Gao, Chao
    Dehghan, Masood
    Jagersand, Martin
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7471 - 7481
  • [43] Boundary-Aware Uncertainty for Automatic Caliper Placement
    Sathish, Rachana
    Venkataramani, Rahul
    Aladahalli, Chandan
    Shriram, K. S.
    Sudhakar, Prasad
    MEDICAL IMAGING 2024: IMAGE PROCESSING, 2024, 12926
  • [44] Boundary-aware vehicle tracking upon UAV
    Han, Yuqi
    Wang, Hongshuo
    Zhang, Zengshuo
    Wang, Wenzheng
    ELECTRONICS LETTERS, 2020, 56 (17) : 873 - 875
  • [45] Boundary-Aware Hashing for Hamming Space Retrieval
    Hu, Wenjin
    Chen, Yukun
    Wu, Lifang
    Shi, Ge
    Jian, Meng
    APPLIED SCIENCES-BASEL, 2022, 12 (01):
  • [46] Boundary-Aware Feature Propagation for Scene Segmentation
    Ding, Henghui
    Jiang, Xudong
    Liu, Ai Qun
    Thalmann, Nadia Magnenat
    Wang, Gang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6818 - 6828
  • [47] Boundary-Aware Uncertainty for Feature Attribution Explainers
    Hill, Davin
    Masoomi, Aria
    Torop, Max
    Ghimire, Sandesh
    Dy, Jennifer
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [48] Boundary-Aware Transformers for Skin Lesion Segmentation
    Wang, Jiacheng
    Wei, Lan
    Wang, Liansheng
    Zhou, Qichao
    Zhu, Lei
    Qin, Jing
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 : 206 - 216
  • [49] Accelerated masked transformer for dense video captioning
    Yu, Zhou
    Han, Nanjia
    NEUROCOMPUTING, 2021, 445 : 72 - 80
  • [50] UAT: Universal Attention Transformer for Video Captioning
    Im, Heeju
    Choi, Yong-Suk
    SENSORS, 2022, 22 (13)