c-RNN: A Fine-Grained Language Model for Image Captioning

被引:9
|
作者
Huang, Gengshi [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Character-level; Convolutional Neural Network; Recurrent Neural Network; Sequence learning;
D O I
10.1007/s11063-018-9836-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Captioning methods from predecessors that based on the conventional deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) architecture follow translation system using word-level modelling. But an optimal word segmentation algorithm is essential for segmenting sentence into words in word-level modelling, which is a very difficult task. In this paper, we built a character-level RNN (c-RNN) that directly modeled on captions with characterization where descriptive sentence is composed in a flow of characters. The c-RNN performs language task in finer level and naturally avoids the word segmentation issue. Our c-RNN empowered the language model to dynamically reason about word spelling as well as grammatical rules which results in expressive and elaborate sentence. We optimized parameters of neural nets by maximizing the probabilities of correctly generated characterized sentences. Quantitative and qualitative experiments on the most popular datasets MSCOCO and Flickr30k showed that our c-RNN could describe images with a considerably faster speed and satisfactory quality.
引用
收藏
页码:683 / 691
页数:9
相关论文
共 50 条
  • [21] High-Quality Image Captioning With Fine-Grained and Semantic-Guided Visual Attention
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (07) : 1681 - 1693
  • [22] CASCADE ATTENTION FUSION FOR FINE-GRAINED IMAGE CAPTIONING BASED ON MULTI-LAYER LSTM
    Wang, Shuang
    Meng, Yun
    Gu, Yu
    Zhang, Lei
    Ye, Xiutiao
    Tian, Jingxian
    Jiao, Licheng
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2245 - 2249
  • [23] Attention-Guided Hierarchical Parsing for Fine-Grained Person-Centric Image Captioning
    Gu, Zhengcheng
    Jin, Jing
    IEEE ACCESS, 2024, 12 : 86293 - 86301
  • [24] Attribute-Driven Filtering: A new attributes predicting approach for fine-grained image captioning
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Ul Hassan, Shabih
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [25] Fine-Grained Length Controllable Video Captioning With Ordinal Embeddings
    Nitta, Tomoya
    Fukuzawa, Takumi
    Tamaki, Toru
    IEEE ACCESS, 2024, 12 : 189667 - 189688
  • [26] Object Localization Based on Natural Language Descriptions for Fine-Grained Image
    Duan, Lijuan
    Liang, Mingliang
    En, Qing
    Qiao, Yuanhua
    Miao, Jun
    Ma, Longlong
    INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND ROBOTICS 2020, 2020, 11574
  • [27] Fine-Grained Fish Disease Image Recognition Algorithm Model
    Wei Liming
    Zhao Kui
    Wang Ning
    Zhang Zhongyan
    Cui Haipeng
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (16)
  • [28] Fine-Grained Image Classification with Object-Part Model
    Hong, Jinlong
    Huang, Kaizhu
    Liang, Hai-Ning
    Wang, Xinheng
    Zhang, Rui
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, 2020, 11691 : 233 - 243
  • [29] Fine-Grained Image Classification Model Based on Improved Transformer
    Tian Zhansheng
    Liu Libo
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (02)
  • [30] A Survey of Fine-Grained Image Categorization
    Zheng, Min
    Li, Qingyong
    Geng, Yangli-ao
    Yu, Haomin
    Wang, Jianzhu
    Gan, Jinrui
    Xue, Wenyuan
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 533 - 538