Long-tail image captioning with dynamic semantic memory network

被引:0
|
作者
Liu, Hao [1 ]
Yang, Xiaoshan [1 ]
Xu, Changsheng [1 ]
机构
[1] National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing,100190, China
基金
中国国家自然科学基金;
关键词
Deep learning - Knowledge management - Statistical tests;
D O I
暂无
中图分类号
学科分类号
摘要
Image captioning takes image as input and outputs a text sequence. Nowadays, most images included in image captioning datasets are captured from daily life of internet users. Captions of these images are consequently composed of a few common words and many rare words. Most existing studies focus on improving performance of captioning in the whole dataset, regardless of captioning performance among rare words. To solve this problem, we introduce long-tail image captioning with dynamic semantic memory network (DSMN). Long-tail image captioning requires model improving performance of rare words generation, while maintaining good performance of common words generation. DSMN model dynamically mining the global semantic relationship between rare words and common words, enabling knowledge transfer from common words to rare words. Result shows DSMN improves performance of semantic representation of rare words by collaborating global words semantic relation and local semantic information of the input picture and generated words. For better evaluation on long-tail image captioning, we organized a task-specified test split Few-COCO from original MS COCO Captioning dataset. By conducting quantitative and qualitative experiments, the rare words description precision of DSMN model on Few-COCO dataset is 0. 602 8%, the recall is 0. 323 4%, and the F-1 value is 0. 356 7%, showing significant improvement compared with baseline methods. © 2022 Beijing University of Aeronautics and Astronautics (BUAA). All rights reserved.
引用
收藏
页码:1399 / 1408
相关论文
共 50 条
  • [1] Domain Adaptative Semantic Segmentation by alleviating Long-tail Problem
    Li, Wei
    Li, Zhixin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [2] Memory Bank Augmented Long-tail Sequential Recommendation
    Hu, Yidan
    Liu, Yong
    Miao, Chunyan
    Miao, Yuan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 791 - 801
  • [3] Dense semantic embedding network for image captioning
    Xiao, Xinyu
    Wang, Lingfeng
    Ding, Kun
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION, 2019, 90 : 285 - 296
  • [4] A Context Semantic Auxiliary Network for Image Captioning
    Li, Jianying
    Shao, Xiangjun
    INFORMATION, 2023, 14 (07)
  • [5] Long-Tail Hashing
    Chen, Yong
    Hou, Yuqing
    Leng, Shu
    Zhang, Qing
    Lin, Zhouchen
    Zhang, Dell
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1328 - 1338
  • [6] A pest image recognition method for long-tail distribution problem
    Chen, Shengbo
    Gao, Quan
    He, Yun
    FRONTIERS IN ENVIRONMENTAL SCIENCE, 2024, 12
  • [7] Cascade Semantic Prompt Alignment Network for Image Captioning
    Li, Jingyu
    Zhang, Lei
    Zhang, Kun
    Hu, Bo
    Xie, Hongtao
    Mao, Zhendong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5266 - 5281
  • [8] On the Long-Tail Entities in News
    Esquivel, Jose
    Albakour, Dyaa
    Martinez, Miguel
    Corney, David
    Moussa, Samir
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 691 - 697
  • [9] THE TALE OF THE LONG-TAIL PAIR
    LIDGEY, J
    ELECTRONICS & WIRELESS WORLD, 1985, 91 (1595): : 74 - 76
  • [10] The Long-Tail Strategy for IT Outsourcing
    Su, Ning
    Levina, Natalia
    Ross, Jeanne W.
    MIT SLOAN MANAGEMENT REVIEW, 2016, 57 (02) : 81 - +