c-RNN: A Fine-Grained Language Model for Image Captioning

被引：9

作者：

Huang, Gengshi ^{[1
]}

Hu, Haifeng ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou 510006, Guangdong, Peoples R China

来源：

NEURAL PROCESSING LETTERS | 2019年 / 49卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Image captioning; Character-level; Convolutional Neural Network; Recurrent Neural Network; Sequence learning;

D O I：

10.1007/s11063-018-9836-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Captioning methods from predecessors that based on the conventional deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) architecture follow translation system using word-level modelling. But an optimal word segmentation algorithm is essential for segmenting sentence into words in word-level modelling, which is a very difficult task. In this paper, we built a character-level RNN (c-RNN) that directly modeled on captions with characterization where descriptive sentence is composed in a flow of characters. The c-RNN performs language task in finer level and naturally avoids the word segmentation issue. Our c-RNN empowered the language model to dynamically reason about word spelling as well as grammatical rules which results in expressive and elaborate sentence. We optimized parameters of neural nets by maximizing the probabilities of correctly generated characterized sentences. Quantitative and qualitative experiments on the most popular datasets MSCOCO and Flickr30k showed that our c-RNN could describe images with a considerably faster speed and satisfactory quality.

引用

页码：683 / 691

页数：9

共 50 条

[21] High-Quality Image Captioning With Fine-Grained and Semantic-Guided Visual Attention
Zhang, Zongjian
Wu, Qiang
Wang, Yang
Chen, Fang
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (07) : 1681 - 1693
[22] CASCADE ATTENTION FUSION FOR FINE-GRAINED IMAGE CAPTIONING BASED ON MULTI-LAYER LSTM
Wang, Shuang
Meng, Yun
Gu, Yu
Zhang, Lei
Ye, Xiutiao
Tian, Jingxian
Jiao, Licheng
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2245 - 2249
[23] Attention-Guided Hierarchical Parsing for Fine-Grained Person-Centric Image Captioning
Gu, Zhengcheng
Jin, Jing
IEEE ACCESS, 2024, 12 : 86293 - 86301
[24] Attribute-Driven Filtering: A new attributes predicting approach for fine-grained image captioning
Hossen, Md. Bipul
Ye, Zhongfu
Abdussalam, Amr
Ul Hassan, Shabih
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
[25] Fine-Grained Length Controllable Video Captioning With Ordinal Embeddings
Nitta, Tomoya
Fukuzawa, Takumi
Tamaki, Toru
IEEE ACCESS, 2024, 12 : 189667 - 189688
[26] Object Localization Based on Natural Language Descriptions for Fine-Grained Image
Duan, Lijuan
Liang, Mingliang
En, Qing
Qiao, Yuanhua
Miao, Jun
Ma, Longlong
INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND ROBOTICS 2020, 2020, 11574
[27] Fine-Grained Fish Disease Image Recognition Algorithm Model
Wei Liming
Zhao Kui
Wang Ning
Zhang Zhongyan
Cui Haipeng
LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (16)
[28] Fine-Grained Image Classification with Object-Part Model
Hong, Jinlong
Huang, Kaizhu
Liang, Hai-Ning
Wang, Xinheng
Zhang, Rui
ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, 2020, 11691 : 233 - 243
[29] Fine-Grained Image Classification Model Based on Improved Transformer
Tian Zhansheng
Liu Libo
LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (02)
[30] A Survey of Fine-Grained Image Categorization
Zheng, Min
Li, Qingyong
Geng, Yangli-ao
Yu, Haomin
Wang, Jianzhu
Gan, Jinrui
Xue, Wenyuan
PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 533 - 538

← 1 2 3 4 5 →