Joint learning of images and videos with a single Vision Transformer

被引：0

作者：

Shimizu, Shuki ^{[1
]}

Tamaki, Toru ^{[1
]}

机构：

[1] Nagoya Inst Technol, Nagoya, Japan

来源：

2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA | 2023年

关键词：

D O I：

10.23919/MVA57639.2023.10215661

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer (IV-ViT), and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.

引用

页数：6

共 50 条

[31] Manipulation Detection in Satellite Images Using Vision Transformer
Horvath, Janos
Baireddy, Sriram
Hao, Hanxiang
Montserrat, Daniel Mas
Delp, Edward J.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1032 - 1041
[32] Recognizing persons in images by learning from videos
Hoerster, Eva
Lux, Jochen
Lienhart, Rainer
MULTIMEDIA CONTENT ACCESS: ALGORITHMS AND SYSTEMS, 2007, 6506
[33] Learning the representation of instrument images in laparoscopy videos
Kletz, Sabrina
Schoeffmann, Klaus
Husslein, Heinrich
HEALTHCARE TECHNOLOGY LETTERS, 2019, 6 (06) : 197 - 203
[34] ViT-MPI: Vision Transformer Multiplane Images for Surgical Single-View View Synthesis
Han, Chenming
Shao, Ruizhi
Wu, Gaochang
Shao, Hang
Liu, Yebin
ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 28 - 40
[35] Vision Transformer Adapters for Generalizable Multitask Learning
Bhattacharjee, Deblina
Susstrunk, Sabine
Salzmann, Mathieu
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18969 - 18980
[36] Anomaly detection in surveillance videos using Transformer with margin learning
Wang, Dicong
Wu, Kaijun
MULTIMEDIA SYSTEMS, 2024, 30 (05)
[37] A New Contrastive Learning-Based Vision Transformer for Sentiment Analysis Using Scene Text Images
Palaiahnakote, Shivakumara
Kapri, Dhruv
Saleem, Muhammad Hammad
Pal, Umapada
International Journal of Pattern Recognition and Artificial Intelligence, 2024, 38 (16)
[38] Medical Report Generation from Medical Images Using Vision Transformer and Bart Deep Learning Architectures
Ucan, Murat
Kaya, Buket
Kaya, Mehmet
Alhajj, Reda
SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2024, PT IV, 2025, 15214 : 257 - 267
[39] Online Continual Learning with Contrastive Vision Transformer
Wang, Zhen
Liu, Liu
Kong, Yajing
Guo, Jiaxian
Tao, Dacheng
COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 631 - 650
[40] Effective and Robust: A Discriminative Temporal Learning Transformer for Satellite Videos
Zhang, Xin
Jiao, Licheng
Li, Lingling
Liu, Xu
Liu, Fang
Yang, Shuyuan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62

← 1 2 3 4 5 →