SBAT: Video Captioning with Sparse Boundary-Aware Transformer

被引：0

作者：

Jin, Tao ^{[1
]}

Huang, Siyu ^{[2
]}

Chen, Ming ^{[3
]}

Li, Yingming ^{[1
]}

Zhang, Zhongfei ^{[4
]}

机构：

[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou, Peoples R China

[2] Baidu Res, Shanghai, Peoples R China

[3] Alibaba Grp, Hangzhou, Peoples R China

[4] Binghamton Univ, Dept Comp Sci, Binghamton, NY USA

来源：

PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we focus on the problem of applying the transformer structure to video captioning effectively. The vanilla transformer is proposed for uni-modal language generation task such as machine translation. However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps. Based on these concerns, we propose a novel method called sparse boundary-aware transformer (SBAT) to reduce the redundancy in video representation. SBAT employs boundary-aware pooling operation for scores from multihead attention and selects diverse features from different scenarios. Also, SBAT includes a local correlation scheme to compensate for the local information loss brought by sparse operation. Based on SBAT, we further propose an aligned cross-modal encoding scheme to boost the multimodal interaction. Experimental results on two benchmark datasets show that SBAT outperforms the state-of-the-art methods under most of the metrics.

引用

页码：630 / 636

页数：7

共 50 条

[21] BaSSL: Boundary-aware Self-Supervised Learning for Video Scene Segmentation
Mun, Jonghwan
Shin, Minchul
Han, Gunsoo
Lee, Sangho
Ha, Seongsu
Lee, Joonseok
Kim, Eun-Sol
COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 485 - 501
[22] BOUNDARY-AWARE BIAS LOSS FOR TRANSFORMER-BASED AERIAL IMAGE SEGMENTATION MODEL
Zhang, Yan
Jiang, Xue
Liu, Siqi
Hu, Bo
Gao, Xinbo
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3528 - 3532
[23] A Boundary-Aware Network for Shadow Removal
Niu, Kunpeng
Liu, Yanli
Wu, Enhua
Xing, Guanyu
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6782 - 6793
[24] Boundary-Aware Multidomain Subspace Deformation
Yang, Yin
Xu, Weiwei
Guo, Xiaohu
Zhou, Kun
Guo, Baining
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (10) : 1633 - 1645
[25] Boundary-Aware CNN for Semantic Segmentation
Zou, Nan
Xiang, Zhiyu
Chen, Yiman
Chen, Shuya
Qiao, Chengyu
IEEE ACCESS, 2019, 7 : 114520 - 114528
[26] Boundary-Aware Network for Kidney Parsing
Hu, Shishuai
Liao, Zehui
Ye, Yiwen
Xia, Yong
LESION SEGMENTATION IN SURGICAL AND DIAGNOSTIC APPLICATIONS, MICCAI 2022, CURIOUS 2022, KIPA 2022, MELA 2022, 2023, 13648 : 9 - 17
[27] Boundary-aware dichotomous image segmentation
Tang, Haonan
Chen, Shuhan
Liu, Yang
Wang, Shiyu
Chen, Zeyu
Hu, Xuelong
VISUAL COMPUTER, 2024, 40 (12): : 9051 - 9062
[28] Look at Boundary: A Boundary-Aware Face Alignment Algorithm
Wu, Wenyan
Qian, Chen
Yang, Shuo
Wang, Quan
Cai, Yici
Zhou, Qiang
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2129 - 2138
[29] BASS: Boundary-Aware Superpixel Segmentation
Rubio, Antonio
Yu, LongLong
Simo-Serra, Edgar
Moreno-Noguer, Francesc
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2824 - 2829
[30] Computer vision to recognize construction waste compositions: A novel boundary-aware transformer (BAT) model
Dong, Zhiming
Chen, Junjie
Lu, Weisheng
Journal of Environmental Management, 2022, 305

← 1 2 3 4 5 →