c-RNN: A Fine-Grained Language Model for Image Captioning

被引:9
|
作者
Huang, Gengshi [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Character-level; Convolutional Neural Network; Recurrent Neural Network; Sequence learning;
D O I
10.1007/s11063-018-9836-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Captioning methods from predecessors that based on the conventional deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) architecture follow translation system using word-level modelling. But an optimal word segmentation algorithm is essential for segmenting sentence into words in word-level modelling, which is a very difficult task. In this paper, we built a character-level RNN (c-RNN) that directly modeled on captions with characterization where descriptive sentence is composed in a flow of characters. The c-RNN performs language task in finer level and naturally avoids the word segmentation issue. Our c-RNN empowered the language model to dynamically reason about word spelling as well as grammatical rules which results in expressive and elaborate sentence. We optimized parameters of neural nets by maximizing the probabilities of correctly generated characterized sentences. Quantitative and qualitative experiments on the most popular datasets MSCOCO and Flickr30k showed that our c-RNN could describe images with a considerably faster speed and satisfactory quality.
引用
收藏
页码:683 / 691
页数:9
相关论文
共 50 条
  • [31] Lifelong Fine-Grained Image Retrieval
    Chen, Wei
    Xu, Haoyang
    Pu, Nan
    Liu, Yu
    Lao, Mingrui
    Wang, Weiping
    Liu, Li
    Lew, Michael S.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7533 - 7544
  • [32] IGINet: integrating geometric information to enhance inter-modal interaction for fine-grained image captioning
    Hossain, Md. Shamim
    Aktar, Shamima
    Liu, Weiyong
    Gu, Naijie
    Huang, Zhangjin
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [33] SAM-GUIDED ENHANCED FINE-GRAINED ENCODING WITH MIXED SEMANTIC LEARNING FOR MEDICAL IMAGE CAPTIONING
    Zhang, Zhenyu
    Wang, Benlu
    Liang, Weijie
    Li, Yizhi
    Guo, Xuechen
    Wang, Guanhong
    Li, Shiyan
    Wang, Gaoang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1731 - 1735
  • [34] END-TO-END SPATIALLY-CONSTRAINED MULTI-PERSPECTIVE FINE-GRAINED IMAGE CAPTIONING
    Zhang, Yifan
    Lin, Chunzhen
    Cao, Donglin
    Lin, Dazhen
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3360 - 3364
  • [35] Fine-Grained RNN With Transfer Learning for Energy Consumption Estimation on EVs
    Hua, Yining
    Sevegnani, Michele
    Yi, Dewei
    Birnie, Andrew
    McAslan, Steve
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (11) : 8182 - 8190
  • [36] Automatic Arabic Image Captioning using RNN-LSTM-Based Language Model and CNN
    Al-Muzaini, Huda A.
    Al-Yahya, Tasniem N.
    Benhidour, Hafida
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (06) : 67 - 73
  • [37] Language-Guided Hierarchical Fine-Grained Image Forgery Detection and Localization
    Guo, Xiao
    Liu, Xiaohong
    Masi, Iacopo
    Liu, Xiaoming
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 2670 - 2691
  • [38] Fine-grained analysis of language varieties and demographics
    Rangel, Francisco
    Rosso, Paolo
    Zaghouani, Wajdi
    Charfi, Anis
    NATURAL LANGUAGE ENGINEERING, 2020, 26 (06) : 641 - 661
  • [39] EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
    Shi, Yaya
    Yang, Xu
    Xu, Haiyang
    Yuan, Chunfeng
    Li, Bing
    Hu, Weiming
    Zha, Zheng-Jun
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17908 - 17917
  • [40] Object-Part Attention Model for Fine-Grained Image Classification
    Peng, Yuxin
    He, Xiangteng
    Zhao, Junjie
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1487 - 1500