Joint learning of images and videos with a single Vision Transformer

被引：0

作者：

Shimizu, Shuki ^{[1
]}

Tamaki, Toru ^{[1
]}

机构：

[1] Nagoya Inst Technol, Nagoya, Japan

来源：

2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA | 2023年

关键词：

D O I：

10.23919/MVA57639.2023.10215661

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer (IV-ViT), and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.

引用

页数：6

共 50 条

[1] Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos
Hussain, Altaf
Hussain, Tanveer
Ullah, Waseem
Baik, Sung Wook
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[2] Unsupervised Learning from Videos for Object Discovery in Single Images
Zhao, Dong
Ding, Baoqing
Wu, Yulin
Chen, Lei
Zhou, Hongchao
SYMMETRY-BASEL, 2021, 13 (01): : 1 - 16
[3] Restoring Snow-Degraded Single Images With Wavelet in Vision Transformer
Agbodike, Obinna
Chen, Jenhui
IEEE ACCESS, 2023, 11 : 99470 - 99480
[4] Panoramic Vision Transformer for Saliency Detection in 360° Videos
Yun, Heeseung
Lee, Sehun
Kim, Gunhee
COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 422 - 439
[5] Vision Transformer-Based Tailing Detection in Videos
Lee, Jaewoo
Lee, Sungjun
Cho, Wonki
Siddiqui, Zahid Ali
Park, Unsang
APPLIED SCIENCES-BASEL, 2021, 11 (24):
[6] TC-Net: A joint learning framework based on CNN and vision transformer for multi-lesion medical images segmentation
Zhang, Zhongxiang
Sun, Guangmin
Zheng, Kun
Yang, Jin-Kui
Zhu, Xiao-rong
Li, Yu
COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 161
[7] A Latent Transformer for Disentangled Face Editing in Images and Videos
Yao, Xu
Newson, Alasdair
Gousseau, Yann
Hellier, Pierre
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13769 - 13778
[8] CONTINUAL LEARNING IN VISION TRANSFORMER
Takeda, Mana
Yanai, Keiji
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 616 - 620
[9] A vision transformer for decoding surgeon activity from surgical videos
Kiyasseh, Dani
Ma, Runzhuo
Haque, Taseen F.
Miles, Brian J.
Wagner, Christian
Donoho, Daniel A.
Anandkumar, Animashree
Hung, Andrew J.
NATURE BIOMEDICAL ENGINEERING, 2023, 7 (06) : 780 - +
[10] A vision transformer for decoding surgeon activity from surgical videos
Dani Kiyasseh
Runzhuo Ma
Taseen F. Haque
Brian J. Miles
Christian Wagner
Daniel A. Donoho
Animashree Anandkumar
Andrew J. Hung
Nature Biomedical Engineering, 2023, 7 : 780 - 796

← 1 2 3 4 5 →