Unified deep learning model for multitask representation and transfer learning: image classification, object detection, and image captioning

被引:1
|
作者
Bayisa, Leta Yobsan [1 ]
Wang, Weidong [2 ]
Wang, Qingxian [2 ]
Ukwuoma, Chiagoziem C. [3 ,4 ,5 ]
Gutema, Hirpesa Kebede [1 ]
Endris, Ahmed [6 ]
Abu, Turi [7 ]
机构
[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen 518055, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Informat & Software Engn, Chengdu, Sichuan, Peoples R China
[3] Chengdu Univ Technol, Oxford Brooks Univ, Sino British Collaborat Educ, Chengdu 610059, Sichuan, Peoples R China
[4] Chengdu Univ Technol, Coll Nucl Technol & Automation Engn, Chengdu 610059, Sichuan, Peoples R China
[5] Chengdu Univ Technol, Sichuan Engn Technol Res Ctr Ind Internet Intellig, Chengdu 610059, Sichuan, Peoples R China
[6] Chinese Acad Sci, Shenzhen Inst Adv Technol, Paul C Lauterbur Res Ctr Biomed Imaging, Shenzhen, Peoples R China
[7] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
关键词
Multitask representation learning; Transfer learning; Encoder-decoder; Attention mechanism;
D O I
10.1007/s13042-024-02177-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The application of deep learning has demonstrated impressive performance in computer vision tasks such as object detection, image classification, and image captioning. Though most models excel at performing single vision or language tasks, designing a single architecture that balances task specialization, performance, and adaptability across diverse tasks is challenging. To effectively address vision and language integration challenges, a combination of text embeddings and visual representation is necessary to understand dependencies of each subarea for multiple tasks. This paper proposes a single architecture that can handle various tasks in computer vision with fine-tuning capabilities for other specific vision and language tasks. The proposed model employs a modified DenseNet201 as a feature extractor (network backbone), an encoder-decoder architecture, and a task-specific head for inference. To tackle overfitting and improve precision, enhanced data augmentation and normalization techniques are employed. The model's robustness is evaluated on over five datasets for different tasks: image classification, object detection, image captioning, and adversarial attack and defense. The experimental results demonstrate competitive performance compared to other works on CIFAR-10, CIFAR-100, Flickr8, Flickr30, Caltech10, and other task-specific datasets such as OCT, BreakHis, and so on. The proposed model is flexible and easy to adapt to new tasks, as it can also be extended to other vision and language tasks through fine-tuning with task-specific input indices.
引用
收藏
页码:4617 / 4637
页数:21
相关论文
共 50 条
  • [41] Polyp Image Detection and Classification Using Deep Learning
    Chen, Yao-Tien
    Ahmad, Nisar
    Liang, Jin-Wei
    2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 455 - 456
  • [42] Image and Video Captioning for Apparels Using Deep Learning
    Agarwal, Govind
    Jindal, Kritika
    Chowdhury, Abishi
    Singh, Vishal K.
    Pal, Amrit
    IEEE ACCESS, 2024, 12 : 113138 - 113150
  • [43] Invasive weed optimization with deep transfer learning for multispectral image classification model
    Rajakani, M.
    Kavitha, R. J.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 45519 - 45534
  • [44] Invasive weed optimization with deep transfer learning for multispectral image classification model
    Rajakani M
    Kavitha RJ
    Multimedia Tools and Applications, 2024, 83 : 45519 - 45534
  • [45] Generative image captioning in Urdu using deep learning
    Afzal M.K.
    Shardlow M.
    Tuarob S.
    Zaman F.
    Sarwar R.
    Ali M.
    Aljohani N.R.
    Lytras M.D.
    Nawaz R.
    Hassan S.-U.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (06) : 7719 - 7731
  • [46] Fake Colorized Image Detection Based on Special Image Representation and Transfer Learning
    Salman, Khalid A.
    Shaker, Khalid A.
    Al-Janabi, Sufyan
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2023, 22 (03)
  • [47] A Hybridized Deep Learning Method for Bengali Image Captioning
    Humaira, Mayeesha
    Paul, Shimul
    Jim, Md Abidur Rahman Khan
    Ami, Amit Saha
    Shah, Faisal Muhammad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 698 - 707
  • [48] Deep learning-based solar image captioning
    Baek, Ji-Hye
    Kim, Sujin
    Choi, Seonghwan
    Park, Jongyeob
    Kim, Dongil
    ADVANCES IN SPACE RESEARCH, 2024, 73 (06) : 3270 - 3281
  • [49] Intelligent Deep Transfer Learning Based Malaria Parasite Detection and Classification Model Using Biomedical Image
    Alassaf, Ahmad
    Sikkandar, Mohamed Yacin
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 72 (03): : 5273 - 5285
  • [50] Image Captioning Using Multimodal Deep Learning Approach
    Farkh, Rihem
    Oudinet, Ghislain
    Foued, Yasser
    Computers, Materials and Continua, 2024, 81 (03): : 3951 - 3968