Long-tail image captioning with dynamic semantic memory network

被引：0

作者：

Liu, Hao ^{[1
]}

Yang, Xiaoshan ^{[1
]}

Xu, Changsheng ^{[1
]}

机构：

[1] National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing,100190, China

来源：

Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics | 2022年 / 48卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Deep learning - Knowledge management - Statistical tests;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Image captioning takes image as input and outputs a text sequence. Nowadays, most images included in image captioning datasets are captured from daily life of internet users. Captions of these images are consequently composed of a few common words and many rare words. Most existing studies focus on improving performance of captioning in the whole dataset, regardless of captioning performance among rare words. To solve this problem, we introduce long-tail image captioning with dynamic semantic memory network (DSMN). Long-tail image captioning requires model improving performance of rare words generation, while maintaining good performance of common words generation. DSMN model dynamically mining the global semantic relationship between rare words and common words, enabling knowledge transfer from common words to rare words. Result shows DSMN improves performance of semantic representation of rare words by collaborating global words semantic relation and local semantic information of the input picture and generated words. For better evaluation on long-tail image captioning, we organized a task-specified test split Few-COCO from original MS COCO Captioning dataset. By conducting quantitative and qualitative experiments, the rare words description precision of DSMN model on Few-COCO dataset is 0. 602 8%, the recall is 0. 323 4%, and the F-1 value is 0. 356 7%, showing significant improvement compared with baseline methods. © 2022 Beijing University of Aeronautics and Astronautics (BUAA). All rights reserved.

引用

页码：1399 / 1408

共 50 条

[21] Physical space and long-tail markets
Bentley, R. Alexander
Madsen, Mark E.
Ormerod, Paul
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2009, 388 (05) : 691 - 696
[22] Marketing in the era of long-tail media
Rubinson, Joel
JOURNAL OF ADVERTISING RESEARCH, 2008, 48 (03) : 301 - 302
[23] Image Captioning with Semantic Attention
You, Quanzeng
Jin, Hailin
Wang, Zhaowen
Fang, Chen
Luo, Jiebo
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4651 - 4659
[24] Long-tail Effect on ECG Classification
Thai, N. H.
Nghia, N. T.
Binh, D. V.
Hai, N. T.
Hung, N. M.
2017 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2017, : 34 - 38
[25] The long-tail of online grocery shopping
Richards, Timothy J.
Rabinovich, Elliot
AGRIBUSINESS, 2018, 34 (03) : 509 - 523
[26] Long-Tail Cross Modal Hashing
Gao, Zijun
Wang, Jun
Yu, Guoxian
Yan, Zhongmin
Domeniconi, Carlotta
Zhang, Jinglin
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7642 - 7650
[27] Deep super-class learning for long-tail distributed image classification
Zhou, Yucan
Hu, Qinghua
Wang, Yu
PATTERN RECOGNITION, 2018, 80 : 118 - 128
[28] Winning Big With Long-Tail Apps
Salz, Peggy Anne
ECONTENT, 2015, 38 (07) : 32 - 32
[29] Empowering Long-tail Item Recommendation through Cross Decoupling Network (CDN)
Zhang, Yin
Wang, Ruoxi
Cheng, Derek Zhiyuan
Yao, Tiansheng
Yi, Xinyang
Hong, Lichan
Caverlee, James
Chi, Ed H.
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5608 - 5617
[30] Fitting mixtures of exponentials to long-tail distributions to analyze network performance models
Feldmann, A
Whitt, W
PERFORMANCE EVALUATION, 1998, 31 (3-4) : 245 - 279

← 1 2 3 4 5 →