SBAT: Video Captioning with Sparse Boundary-Aware Transformer

被引：0

作者：

Jin, Tao ^{[1
]}

Huang, Siyu ^{[2
]}

Chen, Ming ^{[3
]}

Li, Yingming ^{[1
]}

Zhang, Zhongfei ^{[4
]}

机构：

[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou, Peoples R China

[2] Baidu Res, Shanghai, Peoples R China

[3] Alibaba Grp, Hangzhou, Peoples R China

[4] Binghamton Univ, Dept Comp Sci, Binghamton, NY USA

来源：

PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we focus on the problem of applying the transformer structure to video captioning effectively. The vanilla transformer is proposed for uni-modal language generation task such as machine translation. However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps. Based on these concerns, we propose a novel method called sparse boundary-aware transformer (SBAT) to reduce the redundancy in video representation. SBAT employs boundary-aware pooling operation for scores from multihead attention and selects diverse features from different scenarios. Also, SBAT includes a local correlation scheme to compensate for the local information loss brought by sparse operation. Based on SBAT, we further propose an aligned cross-modal encoding scheme to boost the multimodal interaction. Experimental results on two benchmark datasets show that SBAT outperforms the state-of-the-art methods under most of the metrics.

引用

页码：630 / 636

页数：7

共 50 条

[31] Making Procedural Water Waves Boundary-aware
Jeschke, S.
Hafner, C.
Chentanez, N.
Macklin, M.
Mueller-Fischer, M.
Wojtan, C.
COMPUTER GRAPHICS FORUM, 2020, 39 (08) : 47 - 54
[32] Depth-Aware Sparse Transformer for Video-Language Learning
Zhang, Haonan
Gao, Lianli
Zeng, Pengpeng
Hanjalic, Alan
Shen, Heng Tao
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4778 - 4787
[33] Context-aware transformer for image captioning
Yang, Xin
Wang, Ying
Chen, Haishun
Li, Jie
Huang, Tingting
NEUROCOMPUTING, 2023, 549
[34] A Position-Aware Transformer for Image Captioning
Deng, Zelin
Zhou, Bo
He, Pei
Huang, Jianfeng
Alfarraj, Osama
Tolba, Amr
CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 2065 - 2081
[35] A position-aware transformer for image captioning
Deng, Zelin
Zhou, Bo
He, Pei
Huang, Jianfeng
Alfarraj, Osama
Tolba, Amr
Deng, Zelin (zl_deng@sina.com), 2005, Tech Science Press (70): : 2005 - 2021
[36] Computer vision to recognize construction waste compositions: A novel boundary-aware transformer (BAT) model
Dong, Zhiming
Chen, Junjie
Lu, Weisheng
JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2022, 305
[37] Boundary-Aware Network for Kidney Tumor Segmentation
Hu, Shishuai
Zhang, Jianpeng
Xia, Yong
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2020, 2020, 12436 : 189 - 198
[38] Boundary-aware Graph Convolution for Semantic Segmentation
Hu, Hanzhe
Cui, Jinshi
Zha, Hongbin
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1828 - 1835
[39] Deep boundary-aware semantic image segmentation
Wu, Huisi
Li, Yifan
Chen, Le
Liu, Xueting
Li, Ping
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2021, 32 (3-4)
[40] Boundary-aware single fringe pattern demodulation
Wang, Haixia
Kemao, Qian
OPTICS EXPRESS, 2017, 25 (26): : 32669 - 32685

← 1 2 3 4 5 →