Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data?

被引:49
|
作者
Moutik, Oumaima [1 ]
Sekkat, Hiba [1 ]
Tigani, Smail [1 ]
Chehri, Abdellah [2 ]
Saadane, Rachid [3 ]
Tchakoucht, Taha Ait [1 ]
Paul, Anand [4 ]
机构
[1] Euro Mediterranean Univ, Euromed Res Ctr, Engn Unit, Fes 30030, Morocco
[2] Royal Mil Coll Canada, Dept Math & Comp Sci, Kingston, ON K7K 7B4, Canada
[3] Hassania Sch Publ Works, SIRC LaGeS, Casablanca 8108, Morocco
[4] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu 41566, South Korea
关键词
convolutional neural networks; vision transformers; recurrent neural networks; conversational systems; action recognition; natural language understanding; action recognitions; COMPUTER VISION; ATTENTION;
D O I
10.3390/s23020734
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Understanding actions in videos remains a significant challenge in computer vision, which has been the subject of several pieces of research in the last decades. Convolutional neural networks (CNN) are a significant component of this topic and play a crucial role in the renown of Deep Learning. Inspired by the human vision system, CNN has been applied to visual data exploitation and has solved various challenges in various computer vision tasks and video/image analysis, including action recognition (AR). However, not long ago, along with the achievement of the transformer in natural language processing (NLP), it began to set new trends in vision tasks, which has created a discussion around whether the Vision Transformer models (ViT) will replace CNN in action recognition in video clips. This paper conducts this trending topic in detail, the study of CNN and Transformer for Action Recognition separately and a comparative study of the accuracy-complexity trade-off. Finally, based on the performance analysis's outcome, the question of whether CNN or Vision Transformers will win the race will be discussed.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] DeepImageDroid: A Hybrid Framework Leveraging Visual Transformers and Convolutional Neural Networks for Robust Android Malware Detection
    Chimezie Obidiagha, Collins
    Rahouti, Mohamed
    Hayajneh, Thaier
    IEEE ACCESS, 2024, 12 : 156285 - 156306
  • [42] Application of deep learning for semantic segmentation in robotic prostatectomy: Comparison of convolutional neural networks and visual transformers
    Pak, Sahyun
    Park, Sung Gon
    Park, Jeonghyun
    Choi, Hong Rock
    Lee, Jun Ho
    Lee, Wonchul
    Cho, Sung Tae
    Lee, Young Goo
    Ahn, Hanjong
    INVESTIGATIVE AND CLINICAL UROLOGY, 2024, 65 (06) : 551 - 558
  • [43] Comparison between vision transformers and convolutional neural networks to predict non-small lung cancer recurrence
    Annarita Fanizzi
    Federico Fadda
    Maria Colomba Comes
    Samantha Bove
    Annamaria Catino
    Erika Di Benedetto
    Angelo Milella
    Michele Montrone
    Annalisa Nardone
    Clara Soranno
    Alessandro Rizzo
    Deniz Can Guven
    Domenico Galetta
    Raffaella Massafra
    Scientific Reports, 13 (1)
  • [44] Comparative analysis of vision transformers and convolutional neural networks in osteoporosis detection from X-ray images
    Sarmadi, Ali
    Razavi, Zahra Sadat
    van Wijnen, Andre J.
    Soltani, Madjid
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [45] Advancing DDoS attack detection with hybrid deep learning: integrating convolutional neural networks, PCA, and vision transformers
    Shaikh, Jahangir
    Syed, Toqeer Ali
    Shah, Syed Aziz
    Jan, Salman
    Ul Ain, Qurat
    Singh, Pradeep Kumar
    INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2024, 17 (01):
  • [46] Current status and prospects of artificial intelligence in breast cancer pathology: convolutional neural networks to prospective Vision Transformers
    Katayama, Ayaka
    Aoki, Yuki
    Watanabe, Yukako
    Horiguchi, Jun
    Rakha, Emad A.
    Oyama, Tetsunari
    INTERNATIONAL JOURNAL OF CLINICAL ONCOLOGY, 2024, 29 (11) : 1648 - 1668
  • [47] Comparison between vision transformers and convolutional neural networks to predict non-small lung cancer recurrence
    Fanizzi, Annarita
    Fadda, Federico
    Comes, Maria Colomba
    Bove, Samantha
    Catino, Annamaria
    Di Benedetto, Erika
    Milella, Angelo
    Montrone, Michele
    Nardone, Annalisa
    Soranno, Clara
    Rizzo, Alessandro
    Guven, Deniz Can
    Galetta, Domenico
    Massafra, Raffaella
    SCIENTIFIC REPORTS, 2023, 13 (01):
  • [48] Head and Neck Cancer Segmentation in FDG PET Images: Performance Comparison of Convolutional Neural Networks and Vision Transformers
    Xiong, Xiaofan
    Smith, Brian J.
    Graves, Stephen A.
    Graham, Michael M.
    Buatti, John M.
    Beichel, Reinhard R.
    TOMOGRAPHY, 2023, 9 (05) : 1933 - 1948
  • [49] Employing data generation for visual weapon identification using Convolutional Neural Networks
    Dwivedi, Neelam
    Singh, Dushyant Kumar
    Kushwaha, Dharmender Singh
    MULTIMEDIA SYSTEMS, 2022, 28 (01) : 347 - 360
  • [50] Employing data generation for visual weapon identification using Convolutional Neural Networks
    Neelam Dwivedi
    Dushyant Kumar Singh
    Dharmender Singh Kushwaha
    Multimedia Systems, 2022, 28 : 347 - 360