SBAT: Video Captioning with Sparse Boundary-Aware Transformer

被引:0
|
作者
Jin, Tao [1 ]
Huang, Siyu [2 ]
Chen, Ming [3 ]
Li, Yingming [1 ]
Zhang, Zhongfei [4 ]
机构
[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou, Peoples R China
[2] Baidu Res, Shanghai, Peoples R China
[3] Alibaba Grp, Hangzhou, Peoples R China
[4] Binghamton Univ, Dept Comp Sci, Binghamton, NY USA
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we focus on the problem of applying the transformer structure to video captioning effectively. The vanilla transformer is proposed for uni-modal language generation task such as machine translation. However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps. Based on these concerns, we propose a novel method called sparse boundary-aware transformer (SBAT) to reduce the redundancy in video representation. SBAT employs boundary-aware pooling operation for scores from multihead attention and selects diverse features from different scenarios. Also, SBAT includes a local correlation scheme to compensate for the local information loss brought by sparse operation. Based on SBAT, we further propose an aligned cross-modal encoding scheme to boost the multimodal interaction. Experimental results on two benchmark datasets show that SBAT outperforms the state-of-the-art methods under most of the metrics.
引用
收藏
页码:630 / 636
页数:7
相关论文
共 50 条
  • [21] BaSSL: Boundary-aware Self-Supervised Learning for Video Scene Segmentation
    Mun, Jonghwan
    Shin, Minchul
    Han, Gunsoo
    Lee, Sangho
    Ha, Seongsu
    Lee, Joonseok
    Kim, Eun-Sol
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 485 - 501
  • [22] BOUNDARY-AWARE BIAS LOSS FOR TRANSFORMER-BASED AERIAL IMAGE SEGMENTATION MODEL
    Zhang, Yan
    Jiang, Xue
    Liu, Siqi
    Hu, Bo
    Gao, Xinbo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3528 - 3532
  • [23] A Boundary-Aware Network for Shadow Removal
    Niu, Kunpeng
    Liu, Yanli
    Wu, Enhua
    Xing, Guanyu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6782 - 6793
  • [24] Boundary-Aware Multidomain Subspace Deformation
    Yang, Yin
    Xu, Weiwei
    Guo, Xiaohu
    Zhou, Kun
    Guo, Baining
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (10) : 1633 - 1645
  • [25] Boundary-Aware CNN for Semantic Segmentation
    Zou, Nan
    Xiang, Zhiyu
    Chen, Yiman
    Chen, Shuya
    Qiao, Chengyu
    IEEE ACCESS, 2019, 7 : 114520 - 114528
  • [26] Boundary-Aware Network for Kidney Parsing
    Hu, Shishuai
    Liao, Zehui
    Ye, Yiwen
    Xia, Yong
    LESION SEGMENTATION IN SURGICAL AND DIAGNOSTIC APPLICATIONS, MICCAI 2022, CURIOUS 2022, KIPA 2022, MELA 2022, 2023, 13648 : 9 - 17
  • [27] Boundary-aware dichotomous image segmentation
    Tang, Haonan
    Chen, Shuhan
    Liu, Yang
    Wang, Shiyu
    Chen, Zeyu
    Hu, Xuelong
    VISUAL COMPUTER, 2024, 40 (12): : 9051 - 9062
  • [28] Look at Boundary: A Boundary-Aware Face Alignment Algorithm
    Wu, Wenyan
    Qian, Chen
    Yang, Shuo
    Wang, Quan
    Cai, Yici
    Zhou, Qiang
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2129 - 2138
  • [29] BASS: Boundary-Aware Superpixel Segmentation
    Rubio, Antonio
    Yu, LongLong
    Simo-Serra, Edgar
    Moreno-Noguer, Francesc
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2824 - 2829
  • [30] Computer vision to recognize construction waste compositions: A novel boundary-aware transformer (BAT) model
    Dong, Zhiming
    Chen, Junjie
    Lu, Weisheng
    Journal of Environmental Management, 2022, 305