Joint learning of images and videos with a single Vision Transformer

被引：0

作者：

Shimizu, Shuki ^{[1
]}

Tamaki, Toru ^{[1
]}

机构：

[1] Nagoya Inst Technol, Nagoya, Japan

来源：

2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA | 2023年

关键词：

D O I：

10.23919/MVA57639.2023.10215661

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer (IV-ViT), and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.

引用

页数：6

共 50 条

[21] Reversible Joint Blind Watermarking for Medical Images and Videos
Kavitha, K. J.
Shan, B. Priestly
HELIX, 2018, 8 (05): : 3600 - 3606
[22] Satellite Images Analysis and Classification using Deep Learning-based Vision Transformer Model
Adegun, Adekanmi Adeyinka
Viriri, Serestina
2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 1275 - 1279
[23] A Fusion Deep Learning Model of ResNet and Vision Transformer for 3D CT Images
Liu, Chiyu
Sun, Cunjie
IEEE ACCESS, 2024, 12 : 93389 - 93397
[24] One-Shot GAN: Learning to Generate Samples from Single Images and Videos
Sushko, Vadim
Gall, Juergen
Khoreva, Anna
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2596 - 2600
[25] Automated classification of remote sensing satellite images using deep learning based vision transformer
Adegun, Adekanmi
Viriri, Serestina
Tapamo, Jules-Raymond
APPLIED INTELLIGENCE, 2024, 54 (24) : 13018 - 13037
[26] Stellar Classification with Vision Transformer and SDSS Photometric Images
Yang, Yi
Li, Xin
UNIVERSE, 2024, 10 (05)
[27] Exploring vision transformer: classifying electron-microscopy pollen images with transformer
Duan, Kaibo
Bao, Shi
Liu, Zhiqiang
Cui, Shaodong
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (01): : 735 - 748
[28] A vision transformer for emphysema classification using CT images
Wu, Yanan
Qi, Shouliang
Sun, Yu
Xia, Shuyue
Yao, Yudong
Qian, Wei
PHYSICS IN MEDICINE AND BIOLOGY, 2021, 66 (24):
[29] Image Classification Using Vision Transformer for EtC Images
Hamano, Genki
Imaizumi, Shoko
Kiya, Hitoshi
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1506 - 1513
[30] Exploring vision transformer: classifying electron-microscopy pollen images with transformer
Kaibo Duan
Shi Bao
Zhiqiang Liu
Shaodong Cui
Neural Computing and Applications, 2023, 35 : 735 - 748

← 1 2 3 4 5 →