Joint learning of images and videos with a single Vision Transformer

被引:0
|
作者
Shimizu, Shuki [1 ]
Tamaki, Toru [1 ]
机构
[1] Nagoya Inst Technol, Nagoya, Japan
关键词
D O I
10.23919/MVA57639.2023.10215661
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer (IV-ViT), and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Manipulation Detection in Satellite Images Using Vision Transformer
    Horvath, Janos
    Baireddy, Sriram
    Hao, Hanxiang
    Montserrat, Daniel Mas
    Delp, Edward J.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1032 - 1041
  • [32] Recognizing persons in images by learning from videos
    Hoerster, Eva
    Lux, Jochen
    Lienhart, Rainer
    MULTIMEDIA CONTENT ACCESS: ALGORITHMS AND SYSTEMS, 2007, 6506
  • [33] Learning the representation of instrument images in laparoscopy videos
    Kletz, Sabrina
    Schoeffmann, Klaus
    Husslein, Heinrich
    HEALTHCARE TECHNOLOGY LETTERS, 2019, 6 (06) : 197 - 203
  • [34] ViT-MPI: Vision Transformer Multiplane Images for Surgical Single-View View Synthesis
    Han, Chenming
    Shao, Ruizhi
    Wu, Gaochang
    Shao, Hang
    Liu, Yebin
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 28 - 40
  • [35] Vision Transformer Adapters for Generalizable Multitask Learning
    Bhattacharjee, Deblina
    Susstrunk, Sabine
    Salzmann, Mathieu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18969 - 18980
  • [36] Anomaly detection in surveillance videos using Transformer with margin learning
    Wang, Dicong
    Wu, Kaijun
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [37] A New Contrastive Learning-Based Vision Transformer for Sentiment Analysis Using Scene Text Images
    Palaiahnakote, Shivakumara
    Kapri, Dhruv
    Saleem, Muhammad Hammad
    Pal, Umapada
    International Journal of Pattern Recognition and Artificial Intelligence, 2024, 38 (16)
  • [38] Medical Report Generation from Medical Images Using Vision Transformer and Bart Deep Learning Architectures
    Ucan, Murat
    Kaya, Buket
    Kaya, Mehmet
    Alhajj, Reda
    SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2024, PT IV, 2025, 15214 : 257 - 267
  • [39] Online Continual Learning with Contrastive Vision Transformer
    Wang, Zhen
    Liu, Liu
    Kong, Yajing
    Guo, Jiaxian
    Tao, Dacheng
    COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 631 - 650
  • [40] Effective and Robust: A Discriminative Temporal Learning Transformer for Satellite Videos
    Zhang, Xin
    Jiao, Licheng
    Li, Lingling
    Liu, Xu
    Liu, Fang
    Yang, Shuyuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62