SBAT: Video Captioning with Sparse Boundary-Aware Transformer

被引：0

作者：

Jin, Tao ^{[1
]}

Huang, Siyu ^{[2
]}

Chen, Ming ^{[3
]}

Li, Yingming ^{[1
]}

Zhang, Zhongfei ^{[4
]}

机构：

[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou, Peoples R China

[2] Baidu Res, Shanghai, Peoples R China

[3] Alibaba Grp, Hangzhou, Peoples R China

[4] Binghamton Univ, Dept Comp Sci, Binghamton, NY USA

来源：

PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we focus on the problem of applying the transformer structure to video captioning effectively. The vanilla transformer is proposed for uni-modal language generation task such as machine translation. However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps. Based on these concerns, we propose a novel method called sparse boundary-aware transformer (SBAT) to reduce the redundancy in video representation. SBAT employs boundary-aware pooling operation for scores from multihead attention and selects diverse features from different scenarios. Also, SBAT includes a local correlation scheme to compensate for the local information loss brought by sparse operation. Based on SBAT, we further propose an aligned cross-modal encoding scheme to boost the multimodal interaction. Experimental results on two benchmark datasets show that SBAT outperforms the state-of-the-art methods under most of the metrics.

引用

页码：630 / 636

页数：7

共 50 条

[41] Boundary-Aware Spatial and Frequency Dual-Domain Transformer for Remote Sensing Urban Images Segmentation
Zhang, Jie
Shao, Mingwen
Wan, Yecong
Meng, Lingzhuang
Cao, Xiangyong
Wang, Shuigen
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 1
[42] BASNet: Boundary-Aware Salient Object Detection
Qin, Xuebin
Zhang, Zichen
Huang, Chenyang
Gao, Chao
Dehghan, Masood
Jagersand, Martin
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7471 - 7481
[43] Boundary-Aware Uncertainty for Automatic Caliper Placement
Sathish, Rachana
Venkataramani, Rahul
Aladahalli, Chandan
Shriram, K. S.
Sudhakar, Prasad
MEDICAL IMAGING 2024: IMAGE PROCESSING, 2024, 12926
[44] Boundary-aware vehicle tracking upon UAV
Han, Yuqi
Wang, Hongshuo
Zhang, Zengshuo
Wang, Wenzheng
ELECTRONICS LETTERS, 2020, 56 (17) : 873 - 875
[45] Boundary-Aware Hashing for Hamming Space Retrieval
Hu, Wenjin
Chen, Yukun
Wu, Lifang
Shi, Ge
Jian, Meng
APPLIED SCIENCES-BASEL, 2022, 12 (01):
[46] Boundary-Aware Feature Propagation for Scene Segmentation
Ding, Henghui
Jiang, Xudong
Liu, Ai Qun
Thalmann, Nadia Magnenat
Wang, Gang
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6818 - 6828
[47] Boundary-Aware Uncertainty for Feature Attribution Explainers
Hill, Davin
Masoomi, Aria
Torop, Max
Ghimire, Sandesh
Dy, Jennifer
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[48] Boundary-Aware Transformers for Skin Lesion Segmentation
Wang, Jiacheng
Wei, Lan
Wang, Liansheng
Zhou, Qichao
Zhu, Lei
Qin, Jing
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 : 206 - 216
[49] Accelerated masked transformer for dense video captioning
Yu, Zhou
Han, Nanjia
NEUROCOMPUTING, 2021, 445 : 72 - 80
[50] UAT: Universal Attention Transformer for Video Captioning
Im, Heeju
Choi, Yong-Suk
SENSORS, 2022, 22 (13)

← 1 2 3 4 5 →