Long-tail image captioning with dynamic semantic memory network

被引:0
|
作者
Liu, Hao [1 ]
Yang, Xiaoshan [1 ]
Xu, Changsheng [1 ]
机构
[1] National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing,100190, China
基金
中国国家自然科学基金;
关键词
Deep learning - Knowledge management - Statistical tests;
D O I
暂无
中图分类号
学科分类号
摘要
Image captioning takes image as input and outputs a text sequence. Nowadays, most images included in image captioning datasets are captured from daily life of internet users. Captions of these images are consequently composed of a few common words and many rare words. Most existing studies focus on improving performance of captioning in the whole dataset, regardless of captioning performance among rare words. To solve this problem, we introduce long-tail image captioning with dynamic semantic memory network (DSMN). Long-tail image captioning requires model improving performance of rare words generation, while maintaining good performance of common words generation. DSMN model dynamically mining the global semantic relationship between rare words and common words, enabling knowledge transfer from common words to rare words. Result shows DSMN improves performance of semantic representation of rare words by collaborating global words semantic relation and local semantic information of the input picture and generated words. For better evaluation on long-tail image captioning, we organized a task-specified test split Few-COCO from original MS COCO Captioning dataset. By conducting quantitative and qualitative experiments, the rare words description precision of DSMN model on Few-COCO dataset is 0. 602 8%, the recall is 0. 323 4%, and the F-1 value is 0. 356 7%, showing significant improvement compared with baseline methods. © 2022 Beijing University of Aeronautics and Astronautics (BUAA). All rights reserved.
引用
收藏
页码:1399 / 1408
相关论文
共 50 条
  • [41] Benchmarking Long-tail Generalization with Likelihood Splits
    Godbole, Ameya
    Jia, Robin
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 963 - 983
  • [42] Long-tail liabilities and claims management in the NHS
    Fenn, P
    Hodges, R
    LAW AND UNCERTAINTY: RISKS AND LEGAL PROCESSES, 1997, : 241 - 253
  • [43] Complementary Product Recommendation for Long-tail Products
    Papso, Rastislav
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1305 - 1311
  • [44] Capturing long-tail distributions of object subcategories
    Zhu, Xiangxin
    Anguelov, Dragomir
    Ramanan, Deva
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 915 - 922
  • [45] A CONSIDERATION OF APPEARANCE OF LONG-TAIL TRICHEL PULSES
    SAWA, G
    SHINOHARA, U
    IEDA, M
    JOURNAL OF APPLIED PHYSICS, 1967, 38 (13) : 5352 - +
  • [46] Logit Normalization for Long-Tail Object Detection
    Zhao, Liang
    Teng, Yao
    Wang, Limin
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (06) : 2114 - 2134
  • [47] Object semantic analysis for image captioning
    Sen Du
    Hong Zhu
    Guangfeng Lin
    Dong Wang
    Jing Shi
    Jing Wang
    Multimedia Tools and Applications, 2023, 82 : 43179 - 43206
  • [48] Semantic Tensor Product for Image Captioning
    Sur, Chiranjib
    Liu, Pei
    Zhou, Yingjie
    Wu, Dapeng
    5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 33 - 37
  • [49] Long-tail behavior in locomotion of Caenorhabditis elegans
    Ohkubo, Jun
    Yoshida, Kazushi
    Iino, Yuichi
    Masuda, Naoki
    JOURNAL OF THEORETICAL BIOLOGY, 2010, 267 (02) : 213 - 222
  • [50] Editorial: LONG-TAIL LIABILITY LAW REFORM
    Freckelton, Ian
    JOURNAL OF LAW AND MEDICINE, 2007, 15 (02) : 171 - 175