Enhancing Image Description Generation through Deep Reinforcement Learning: Fusing Multiple Visual Features and Reward Mechanisms

被引:0
|
作者
Li, Yan [1 ]
Wang, Qiyuan [1 ]
Jia, Kaidi [1 ]
机构
[1] Gansu Univ Polit Sci & Law, Sch Cyber Secur, Lanzhou 730070, Peoples R China
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 78卷 / 02期
关键词
Image description; deep reinforcement learning; attention mechanism;
D O I
10.32604/cmc.2024.047822
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image description task is the intersection of computer vision and natural language processing, and it has important prospects, including helping computers understand images and obtaining information for the visually impaired. This study presents an innovative approach employing deep reinforcement learning to enhance the accuracy of natural language descriptions of images. Our method focuses on refining the reward function in deep reinforcement learning, facilitating the generation of precise descriptions by aligning visual and textual features more closely. Our approach comprises three key architectures. Firstly, it utilizes Residual Network 101 (ResNet-101) and Faster Region -based Convolutional Neural Network (Faster R-CNN) to extract average and local image features, respectively, followed by the implementation of a dual attention mechanism for intricate feature fusion. Secondly, the Transformer model is engaged to derive contextual semantic features from textual data. Finally, the generation of descriptive text is executed through a two-layer long short -term memory network (LSTM), directed by the value and reward functions. Compared with the image description method that relies on deep learning, the score of Bilingual Evaluation Understudy (BLEU-1) is 0.762, which is 1.6% higher, and the score of BLEU-4 is 0.299. Consensus-based Image Description Evaluation (CIDEr) scored 0.998, Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scored 0.552, the latter improved by 0.36%. These results not only attest to the viability of our approach but also highlight its superiority in the realm of image description. Future research can explore the integration of our method with other artificial intelligence (AI) domains, such as emotional AI, to create more nuanced and context-aware systems.
引用
收藏
页码:2469 / 2489
页数:21
相关论文
共 50 条
  • [41] Enhancing Medical Diagnosis Through Deep Learning and Machine Learning Approaches in Image Analysis
    Usmani, Usman Ahmad
    Happonen, Ari
    Watada, Junzo
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 4, INTELLISYS 2023, 2024, 825 : 449 - 468
  • [42] Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning
    Xu, Ning
    Zhang, Hanwang
    Liu, An-An
    Nie, Weizhi
    Su, Yuting
    Nie, Jie
    Zhang, Yongdong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (05) : 1372 - 1383
  • [43] A Reward Function Using Image Processing for a Deep Reinforcement Learning Approach Applied to the Sonic the Hedgehog Game
    de Souza, Felipe Rafael
    Miranda, Thiago Silva
    Bernardino, Heder Soares
    INTELLIGENT SYSTEMS, PT II, 2022, 13654 : 181 - 195
  • [44] CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
    Le, Hung
    Wang, Yue
    Gotmare, Akhilesh Deepak
    Savarese, Silvio
    Hoi, Steven C. H.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [45] Visual SLAM method for dynamic environment based on deep learning image features
    Liu D.
    Yu T.
    Cong M.
    Du Y.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2024, 52 (06): : 156 - 163
  • [46] Boosting Performance of Visual Servoing Using Deep Reinforcement Learning From Multiple Demonstrations
    Aflakian, Ali
    Rastegharpanah, Alireza
    Stolkin, Rustam
    IEEE ACCESS, 2023, 11 : 26512 - 26520
  • [47] Enhancing visual quality of spatial image steganography using SqueezeNet deep learning network
    Nagham Hamid
    Balasem Salem Sumait
    Bilal Ibrahim Bakri
    Osamah Al-Qershi
    Multimedia Tools and Applications, 2021, 80 : 36093 - 36109
  • [48] Enhancing visual quality of spatial image steganography using SqueezeNet deep learning network
    Hamid, Nagham
    Sumait, Balasem Salem
    Bakri, Bilal Ibrahim
    Al-Qershi, Osamah
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 36093 - 36109
  • [49] Enhancing image retrieval through entropy-based deep metric learning
    Rahbar K.
    Taheri F.
    Multimedia Tools and Applications, 2025, 84 (11) : 9065 - 9091
  • [50] Learning Profitable NFT Image Diffusions via Multiple Visual-Policy Guided Reinforcement Learning
    He, Huiguo
    Wang, Tianfu
    Yang, Huan
    Fu, Jianlong
    Yuan, Nicholas Jing
    Yin, Jian
    Chao, Hongyang
    Zhang, Qi
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6831 - 6840