A Comprehensive Survey of Recent Transformers in Image, Video and Diffusion Models

被引:2
|
作者
Le, Dinh Phu Cuong [1 ,2 ]
Wang, Dong [1 ]
Le, Viet-Tuan [3 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China
[2] Yersin Univ Da Lat, Fac Informat Technol, Da Lat 66100, Vietnam
[3] Ho Chi Minh City Open Univ, Fac Informat Technol, Ho Chi Minh City 722000, Vietnam
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 80卷 / 01期
基金
湖南省自然科学基金; 中国国家自然科学基金;
关键词
Transformer; vision transformer; self-attention; hierarchical transformer; diffusion models;
D O I
10.32604/cmc.2024.050790
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks (CNNs). The transformers demonstrate the ability to model long-range dependencies by utilizing a self-attention mechanism. This study aims to provide a comprehensive survey of recent transformer- based approaches in image and video applications, as well as diffusion models. We begin by discussing existing surveys of vision transformers and comparing them to this work. Then, we review the main components of a vanilla transformer network, including the self-attention mechanism, feed-forward network, position encoding, etc. In the main part of this survey, we review recent transformer-based models in three categories: Transformer for downstream tasks, Vision Transformer for Generation, and Vision Transformer for Segmentation. We also provide a comprehensive overview of recent transformer models for video tasks and diffusion models. We compare the performance of various hierarchical transformer networks for multiple tasks on popular benchmark datasets. Finally, we explore some future research directions to further improve the field.
引用
收藏
页码:37 / 60
页数:24
相关论文
共 50 条
  • [1] A Survey of Generative Models for Image and Video with Diffusion Model
    Koh, Byoung Soo
    Park, Hyeong Cheol
    Park, Jin Ho
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2024, 14
  • [2] Comprehensive exploration of diffusion models in image generation: a survey
    Chen, Hang
    Xiang, Qian
    Hu, Jiaxin
    Ye, Meilin
    Yu, Chao
    Cheng, Hao
    Zhang, Lei
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (04)
  • [3] A Survey on Video Diffusion Models
    Xing, Zhen
    Feng, Qijun
    Chen, Haoran
    Dai, Qi
    Hu, Hang
    Xu, Hang
    Wu, Zuxuan
    Jiang, Yu-gang
    ACM COMPUTING SURVEYS, 2025, 57 (02)
  • [4] Video Transformers: A Survey
    Selva, Javier
    Johansen, Anders S.
    Escalera, Sergio
    Nasrollahi, Kamal
    Moeslund, Thomas B.
    Clapes, Albert
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12922 - 12943
  • [5] Image captioning by diffusion models: A survey
    Daneshfar, Fatemeh
    Bartani, Ako
    Lotfi, Pardis
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
  • [6] Diffusion Models: A Comprehensive Survey of Methods and Applications
    Yang, Ling
    Zhang, Zhilong
    Song, Yang
    Hong, Shenda
    Xu, Runsheng
    Zhao, Yue
    Zhang, Wentao
    Cui, Bin
    Yang, Ming-Hsuan
    ACM COMPUTING SURVEYS, 2024, 56 (04)
  • [7] Diffusion models in medical imaging: A comprehensive survey
    Kazerouni, Amirhossein
    Aghdam, Ehsan Khodapanah
    Heidari, Moein
    Azad, Reza
    Fayyaz, Mohsen
    Hacihaliloglu, Ilker
    Merhof, Dorit
    MEDICAL IMAGE ANALYSIS, 2023, 88
  • [8] A Comprehensive Survey of Recent Approaches on Microarray Image Data
    Roopa C.K.
    Priya M.P.
    Harish B.S.
    Maheshan M.S.
    SN Computer Science, 5 (1)
  • [9] Diffusion Models for Medical Image Computing: A Survey
    Shi, Yaqing
    Abulizi, Abudukelimu
    Wang, Hao
    Feng, Ke
    Abudukelimu, Nihemaiti
    Su, Youli
    Abudukelimu, Halidanmu
    TSINGHUA SCIENCE AND TECHNOLOGY, 2025, 30 (01): : 357 - 383
  • [10] Scalable Diffusion Models with Transformers
    Peebles, William
    Xie, Saining
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4172 - 4182