Joint learning of images and videos with a single Vision Transformer

被引:0
|
作者
Shimizu, Shuki [1 ]
Tamaki, Toru [1 ]
机构
[1] Nagoya Inst Technol, Nagoya, Japan
关键词
D O I
10.23919/MVA57639.2023.10215661
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer (IV-ViT), and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Reversible Joint Blind Watermarking for Medical Images and Videos
    Kavitha, K. J.
    Shan, B. Priestly
    HELIX, 2018, 8 (05): : 3600 - 3606
  • [22] Satellite Images Analysis and Classification using Deep Learning-based Vision Transformer Model
    Adegun, Adekanmi Adeyinka
    Viriri, Serestina
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 1275 - 1279
  • [23] A Fusion Deep Learning Model of ResNet and Vision Transformer for 3D CT Images
    Liu, Chiyu
    Sun, Cunjie
    IEEE ACCESS, 2024, 12 : 93389 - 93397
  • [24] One-Shot GAN: Learning to Generate Samples from Single Images and Videos
    Sushko, Vadim
    Gall, Juergen
    Khoreva, Anna
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2596 - 2600
  • [25] Automated classification of remote sensing satellite images using deep learning based vision transformer
    Adegun, Adekanmi
    Viriri, Serestina
    Tapamo, Jules-Raymond
    APPLIED INTELLIGENCE, 2024, 54 (24) : 13018 - 13037
  • [26] Stellar Classification with Vision Transformer and SDSS Photometric Images
    Yang, Yi
    Li, Xin
    UNIVERSE, 2024, 10 (05)
  • [27] Exploring vision transformer: classifying electron-microscopy pollen images with transformer
    Duan, Kaibo
    Bao, Shi
    Liu, Zhiqiang
    Cui, Shaodong
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (01): : 735 - 748
  • [28] A vision transformer for emphysema classification using CT images
    Wu, Yanan
    Qi, Shouliang
    Sun, Yu
    Xia, Shuyue
    Yao, Yudong
    Qian, Wei
    PHYSICS IN MEDICINE AND BIOLOGY, 2021, 66 (24):
  • [29] Image Classification Using Vision Transformer for EtC Images
    Hamano, Genki
    Imaizumi, Shoko
    Kiya, Hitoshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1506 - 1513
  • [30] Exploring vision transformer: classifying electron-microscopy pollen images with transformer
    Kaibo Duan
    Shi Bao
    Zhiqiang Liu
    Shaodong Cui
    Neural Computing and Applications, 2023, 35 : 735 - 748