STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

被引:47
|
作者
Yoshikawa, Yuya [1 ]
Shigeto, Yutaro [1 ]
Takeuchi, Akikazu [1 ]
机构
[1] Chiba Inst Technol, STAIR Lab, 2-17-1 Tsudanuma, Narashino, Chiba, Japan
关键词
D O I
10.18653/v1/P17-2066
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most available caption datasets have been constructed for English language, there are few datasets for Japanese. To tackle this problem, we construct a large-scale Japanese image caption dataset based on images from MS-COCO, which is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions for 164,062 images. In the experiment, we show that a neural network trained using STAIR Captions can generate more natural and better Japanese captions, compared to those generated using English-Japanese machine translation after generating English captions.
引用
收藏
页码:417 / 421
页数:5
相关论文
共 50 条
  • [21] A Large-Scale Database of Images and Captions for Automatic Face Naming
    Oezcan, Mert
    Jie, Luo
    Ferrari, Vittorio
    Caputo, Barbara
    PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
  • [22] Large-scale RDF Dataset Slicing
    Marx, Edgard
    Shekarpour, Saeedeh
    Auer, Soeren
    Ngomo, Axel-Cyrille Ngonga
    2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 228 - 235
  • [23] Euler Clustering on Large-scale Dataset
    Wu, Jian-Sheng
    Zheng, Wei-Shi
    Lai, Jian-Huang
    Suen, Ching Y.
    IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (04) : 502 - 515
  • [24] A large-scale solar dynamics observatory image dataset for computer vision applications
    Kucuk, Ahmet
    Banda, Juan M.
    Angryk, Rafal A.
    SCIENTIFIC DATA, 2017, 4
  • [25] Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods
    Qu, Chenfan
    Zhong, Yiwu
    Liu, Chongyu
    Xu, Guitao
    Peng, Dezhi
    Guo, Fengjun
    Jin, Lianwen
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 10781 - 10790
  • [26] OASIS: A Large-Scale Dataset for Single Image 3D in the Wild
    Chen, Weifeng
    Qian, Shengyi
    Fan, David
    Kojima, Noriyuki
    Hamilton, Max
    Deng, Jia
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 676 - 685
  • [27] A large-scale solar dynamics observatory image dataset for computer vision applications
    Ahmet Kucuk
    Juan M. Banda
    Rafal A. Angryk
    Scientific Data, 4
  • [28] Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method
    Yi, Ran
    Tian, Haoyuan
    Gu, Zhihao
    Lai, Yu-Kun
    Rosin, Paul L.
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22388 - 22397
  • [29] MiDaS: a large-scale Minecraft dataset for non-natural image benchmarking
    Torpey, David
    Parkin, Max
    Alter, Jonah
    Klein, Richard
    James, Steven
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (01)
  • [30] I-Nema: a large-scale microscopic image dataset for nematode recognition
    Shenglin Lu
    Sheldon Fung
    Yihao Wang
    Xuequan Lu
    Wanli Ouyang
    Xue Qing
    Hongmei Li
    Neural Computing and Applications, 2025, 37 (4) : 2763 - 2773