Unified deep learning model for multitask representation and transfer learning: image classification, object detection, and image captioning

被引:1
|
作者
Bayisa, Leta Yobsan [1 ]
Wang, Weidong [2 ]
Wang, Qingxian [2 ]
Ukwuoma, Chiagoziem C. [3 ,4 ,5 ]
Gutema, Hirpesa Kebede [1 ]
Endris, Ahmed [6 ]
Abu, Turi [7 ]
机构
[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen 518055, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Informat & Software Engn, Chengdu, Sichuan, Peoples R China
[3] Chengdu Univ Technol, Oxford Brooks Univ, Sino British Collaborat Educ, Chengdu 610059, Sichuan, Peoples R China
[4] Chengdu Univ Technol, Coll Nucl Technol & Automation Engn, Chengdu 610059, Sichuan, Peoples R China
[5] Chengdu Univ Technol, Sichuan Engn Technol Res Ctr Ind Internet Intellig, Chengdu 610059, Sichuan, Peoples R China
[6] Chinese Acad Sci, Shenzhen Inst Adv Technol, Paul C Lauterbur Res Ctr Biomed Imaging, Shenzhen, Peoples R China
[7] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
关键词
Multitask representation learning; Transfer learning; Encoder-decoder; Attention mechanism;
D O I
10.1007/s13042-024-02177-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The application of deep learning has demonstrated impressive performance in computer vision tasks such as object detection, image classification, and image captioning. Though most models excel at performing single vision or language tasks, designing a single architecture that balances task specialization, performance, and adaptability across diverse tasks is challenging. To effectively address vision and language integration challenges, a combination of text embeddings and visual representation is necessary to understand dependencies of each subarea for multiple tasks. This paper proposes a single architecture that can handle various tasks in computer vision with fine-tuning capabilities for other specific vision and language tasks. The proposed model employs a modified DenseNet201 as a feature extractor (network backbone), an encoder-decoder architecture, and a task-specific head for inference. To tackle overfitting and improve precision, enhanced data augmentation and normalization techniques are employed. The model's robustness is evaluated on over five datasets for different tasks: image classification, object detection, image captioning, and adversarial attack and defense. The experimental results demonstrate competitive performance compared to other works on CIFAR-10, CIFAR-100, Flickr8, Flickr30, Caltech10, and other task-specific datasets such as OCT, BreakHis, and so on. The proposed model is flexible and easy to adapt to new tasks, as it can also be extended to other vision and language tasks through fine-tuning with task-specific input indices.
引用
收藏
页码:4617 / 4637
页数:21
相关论文
共 50 条
  • [1] Transformer based Multitask Learning for Image Captioning and Object Detection
    Basak, Debolena
    Srijith, P. K.
    Desarkar, Maunendra Sankar
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PAKDD 2024, 2024, 14646 : 260 - 272
  • [2] Towards Unified Deep Learning Model for NSFW Image and Video Captioning
    Ko, Jong-Won
    Hwang, Dong-Hyun
    ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING, MUE/FUTURETECH 2018, 2019, 518 : 57 - 63
  • [3] Joint Collaborative Representation With Multitask Learning for Hyperspectral Image Classification
    Li, Jiayi
    Zhang, Hongyan
    Zhang, Liangpei
    Huang, Xin
    Zhang, Lefei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2014, 52 (09): : 5923 - 5936
  • [4] Polarimetric SAR Image Classification by Multitask Sparse Representation Learning
    Li, Bo
    Li, Ying
    Chen, Minxia
    2018 7TH INTERNATIONAL CONFERENCE ON DIGITAL HOME (ICDH 2018), 2018, : 31 - 36
  • [5] Multitask Learning for Cross-Domain Image Captioning
    Yang, Min
    Zhao, Wei
    Xu, Wei
    Feng, Yabing
    Zhao, Zhou
    Chen, Xiaojun
    Lei, Kai
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) : 1047 - 1061
  • [6] Crop Disease Diagnosis with Deep Learning-Based Image Captioning and Object Detection
    Lee, Dong In
    Lee, Ji Hwan
    Jang, Seung Ho
    Oh, Se Jong
    Doo, Ill Chul
    APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [7] Multitask Deep Learning With Spectral Knowledge for Hyperspectral Image Classification
    Liu, Shengjie
    Shi, Qian
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (12) : 2110 - 2114
  • [8] Deep Learning for Military Image Captioning
    Das, Subrata
    Jain, Lalit
    Das, Amp
    2018 21ST INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2018, : 2165 - 2171
  • [9] Image Captioning using Deep Learning
    Jain, Yukti Sanjay
    Dhopeshwar, Tanisha
    Chadha, Supreet Kaur
    Pagire, Vrushali
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021,
  • [10] Image Captioning Using Deep Learning
    Adithya, Paluvayi Veera
    Kalidindi, Mourya Viswanadh
    Swaroop, Nallani Jyothi
    Vishwas, H. N.
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 42 - 58