Enhancing Image Description Generation through Deep Reinforcement Learning: Fusing Multiple Visual Features and Reward Mechanisms

被引:0
|
作者
Li, Yan [1 ]
Wang, Qiyuan [1 ]
Jia, Kaidi [1 ]
机构
[1] Gansu Univ Polit Sci & Law, Sch Cyber Secur, Lanzhou 730070, Peoples R China
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 78卷 / 02期
关键词
Image description; deep reinforcement learning; attention mechanism;
D O I
10.32604/cmc.2024.047822
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image description task is the intersection of computer vision and natural language processing, and it has important prospects, including helping computers understand images and obtaining information for the visually impaired. This study presents an innovative approach employing deep reinforcement learning to enhance the accuracy of natural language descriptions of images. Our method focuses on refining the reward function in deep reinforcement learning, facilitating the generation of precise descriptions by aligning visual and textual features more closely. Our approach comprises three key architectures. Firstly, it utilizes Residual Network 101 (ResNet-101) and Faster Region -based Convolutional Neural Network (Faster R-CNN) to extract average and local image features, respectively, followed by the implementation of a dual attention mechanism for intricate feature fusion. Secondly, the Transformer model is engaged to derive contextual semantic features from textual data. Finally, the generation of descriptive text is executed through a two-layer long short -term memory network (LSTM), directed by the value and reward functions. Compared with the image description method that relies on deep learning, the score of Bilingual Evaluation Understudy (BLEU-1) is 0.762, which is 1.6% higher, and the score of BLEU-4 is 0.299. Consensus-based Image Description Evaluation (CIDEr) scored 0.998, Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scored 0.552, the latter improved by 0.36%. These results not only attest to the viability of our approach but also highlight its superiority in the realm of image description. Future research can explore the integration of our method with other artificial intelligence (AI) domains, such as emotional AI, to create more nuanced and context-aware systems.
引用
收藏
页码:2469 / 2489
页数:21
相关论文
共 50 条
  • [31] Enhancing Vehicular Cooperative Downloading with Continuous Seeding through Deep Reinforcement Learning
    Niebisch, Michael
    Pfaller, Daniel
    Djanatliev, Anatoli
    2023 IEEE LATIN-AMERICAN CONFERENCE ON COMMUNICATIONS, LATINCOM, 2023,
  • [32] Enhancing gas detection-based swarming through deep reinforcement learning
    Lee, Sangmin
    Park, Seongjoon
    Kim, Hwangnam
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (13): : 14794 - 14812
  • [33] Enhancing Visual Feedback Control through Early Fusion Deep Learning
    Botezatu, Adrian-Paul
    Ferariu, Lavinia-Eugenia
    Burlacu, Adrian
    ENTROPY, 2023, 25 (10)
  • [34] Enhancing gas detection-based swarming through deep reinforcement learning
    Sangmin Lee
    Seongjoon Park
    Hwangnam Kim
    The Journal of Supercomputing, 2022, 78 : 14794 - 14812
  • [35] LMDC: Learning a multiple description codec for deep learning-based image compression
    Zhao, Lijun
    Zhang, Jinjing
    Bai, Huihui
    Wang, Anhong
    Zhao, Yao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (10) : 13889 - 13910
  • [36] LMDC: Learning a multiple description codec for deep learning-based image compression
    Lijun Zhao
    Jinjing Zhang
    Huihui Bai
    Anhong Wang
    Yao Zhao
    Multimedia Tools and Applications, 2022, 81 : 13889 - 13910
  • [37] Joint Embedding of Deep Visual and Semantic Features for Medical Image Report Generation
    Yang, Yan
    Yu, Jun
    Zhang, Jian
    Han, Weidong
    Jiang, Hanliang
    Huang, Qingming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 167 - 178
  • [38] Image Semantic Description Based on Deep Learning with Multi-attention Mechanisms
    Yang, Jian
    Meng, ZuQiang
    INTELLIGENT INFORMATION PROCESSING IX, 2018, 538 : 356 - 362
  • [39] Enhancing graph structure learning through multiple features and graphs fusion
    Ghiasi, Razieh
    Bosaghzadeh, Alireza
    Amirkhani, Hossein
    COMPUTERS & ELECTRICAL ENGINEERING, 2025, 123
  • [40] Cascaded Fuzzy Reward Mechanisms in Deep Reinforcement Learning for Comprehensive Path Planning in Textile Robotic Systems
    Zhao, Di
    Ding, Zhenyu
    Li, Wenjie
    Zhao, Sen
    Du, Yuhong
    APPLIED SCIENCES-BASEL, 2024, 14 (02):