Long-tail image captioning with dynamic semantic memory network

被引:0
|
作者
Liu, Hao [1 ]
Yang, Xiaoshan [1 ]
Xu, Changsheng [1 ]
机构
[1] National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing,100190, China
基金
中国国家自然科学基金;
关键词
Deep learning - Knowledge management - Statistical tests;
D O I
暂无
中图分类号
学科分类号
摘要
Image captioning takes image as input and outputs a text sequence. Nowadays, most images included in image captioning datasets are captured from daily life of internet users. Captions of these images are consequently composed of a few common words and many rare words. Most existing studies focus on improving performance of captioning in the whole dataset, regardless of captioning performance among rare words. To solve this problem, we introduce long-tail image captioning with dynamic semantic memory network (DSMN). Long-tail image captioning requires model improving performance of rare words generation, while maintaining good performance of common words generation. DSMN model dynamically mining the global semantic relationship between rare words and common words, enabling knowledge transfer from common words to rare words. Result shows DSMN improves performance of semantic representation of rare words by collaborating global words semantic relation and local semantic information of the input picture and generated words. For better evaluation on long-tail image captioning, we organized a task-specified test split Few-COCO from original MS COCO Captioning dataset. By conducting quantitative and qualitative experiments, the rare words description precision of DSMN model on Few-COCO dataset is 0. 602 8%, the recall is 0. 323 4%, and the F-1 value is 0. 356 7%, showing significant improvement compared with baseline methods. © 2022 Beijing University of Aeronautics and Astronautics (BUAA). All rights reserved.
引用
收藏
页码:1399 / 1408
相关论文
共 50 条
  • [31] A Sub-captions Semantic-Guided Network for Image Captioning
    Tian, Wei-Dong
    Zhu, Jun-jun
    Wu, Shuang
    Zhao, Zhong-Qiu
    Zhang, Yu-Zheng
    Zhang, Tian-yu
    INTELLIGENT COMPUTING METHODOLOGIES, PT III, 2022, 13395 : 367 - 379
  • [32] Fitting mixtures of exponentials to long-tail distributions to analyze network performance models
    Feldmann, A
    Whitt, W
    IEEE INFOCOM '97 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, PROCEEDINGS, VOLS 1-3: SIXTEENTH ANNUAL JOINT CONFERENCE OF THE IEEE COMPUTER AND COMMUNICATIONS SOCIETIES - DRIVING THE INFORMATION REVOLUTION, 1997, : 1096 - 1104
  • [33] Long-tail Hashtag Recommendation for Micro-videos with Graph Convolutional Network
    Li, Mengmeng
    Gan, Tian
    Liu, Meng
    Cheng, Zhiyong
    Yin, Jianhua
    Nie, Liqiang
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 509 - 518
  • [34] Denoising Long-Tail Augmented Contrastive Network for Multi-Behavior Recommendation
    He, Jinle
    Yang, Chengyong
    Liu, Jiayi
    Cheng, Jianlin
    IEEE ACCESS, 2024, 12 : 177456 - 177467
  • [35] Long-Tail Recommendation Based on Reflective Indexing
    Szwabe, Andrzej
    Ciesielczyk, Michal
    Misiorek, Pawel
    AI 2011: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7106 : 142 - 151
  • [36] Meta Graph Learning for Long-tail Recommendation
    Wei, Chunyu
    Liang, Jian
    Liu, Di
    Dai, Zehui
    Li, Mang
    Wang, Fei
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2512 - 2522
  • [37] THE LONG-TAIL DISTRIBUTION FUNCTION OF MUTATIONS IN BACTERIA
    Gonzalez, Augusto
    REVISTA CUBANA DE FISICA, 2015, 32 (02): : 86 - 89
  • [38] THE TALE OF THE LONG-TAIL PAIR .2.
    LIDGEY, J
    ELECTRONICS & WIRELESS WORLD, 1985, 91 (1596): : 27 - 31
  • [39] Long-tail Session-based Recommendation
    Liu, Siyi
    Zheng, Yujia
    RECSYS 2020: 14TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, 2020, : 509 - 514
  • [40] Unsupervised Context Retrieval for Long-tail Entities
    Garigliotti, Dario
    Albakour, Dyaa
    Martinez, Miguel
    Balog, Krisztian
    PROCEEDINGS OF THE 2019 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'19), 2019, : 224 - 227