Automatic Caption Generation for News Images

被引:61
|
作者
Feng, Yansong [1 ]
Lapata, Mirella [2 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, 128 Zhong Guan Cun N St, Beijing 100871, Peoples R China
[2] Univ Edinburgh, Informat Forum, Inst Language Cognit & Computat, Sch Informat, Edinburgh EH8 9AB, Midlothian, Scotland
关键词
Caption generation; image annotation; summarization; topic models; NATURAL-LANGUAGE;
D O I
10.1109/TPAMI.2012.118
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper is concerned with the task of automatically generating captions for images, which is important for many image-related applications. Examples include video and image retrieval as well as the development of tools that aid visually impaired individuals to access pictorial information. Our approach leverages the vast resource of pictures available on the web and the fact that many of them are captioned and colocated with thematically related documents. Our model learns to create captions from a database of news articles, the pictures embedded in them, and their captions, and consists of two stages. Content selection identifies what the image and accompanying article are about, whereas surface realization determines how to verbalize the chosen content. We approximate content selection with a probabilistic image annotation model that suggests keywords for an image. The model postulates that images and their textual descriptions are generated by a shared set of latent variables (topics) and is trained on a weakly labeled dataset (which treats the captions and associated news articles as image labels). Inspired by recent work in summarization, we propose extractive and abstractive surface realization models. Experimental results show that it is viable to generate captions that are pertinent to the specific content of an image and its associated article, while permitting creativity in the description. Indeed, the output of our abstractive model compares favorably to handwritten captions and is often superior to extractive methods.
引用
收藏
页码:797 / 812
页数:16
相关论文
共 50 条
  • [31] Evaluation of Automatic Caption Segmentation
    Waller, James M.
    Kushalnagar, Raja S.
    ASSETS'16: PROCEEDINGS OF THE 18TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2016, : 331 - 332
  • [32] Common Subspace for Model and Similarity: Phrase Learning for Caption Generation from Images
    Ushiku, Yoshitaka
    Yamaguchi, Masataka
    Mukuta, Yusuke
    Harada, Tatsuya
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2668 - 2676
  • [33] ImageCLEF 2021 Best of Labs: The Curious Case of Caption Generation for Medical Images
    Nicolson, Aaron
    Dowling, Jason
    Koopman, Bevan
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2022), 2022, 13390 : 190 - 203
  • [34] A Method of Caption Detection in News Video
    Huang, He
    Shi, Ping
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13), 2013, 84 : 502 - 509
  • [35] Semantic summary automatic generation in news event
    Liu, Weidong
    Luo, Xiangfeng
    Zhang, Jun
    Xue, Ruirong
    Xu, Richard Yi Da
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (24):
  • [36] Automatic feature generation in endoscopic images
    Ulrich Klank
    Nicolas Padoy
    Hubertus Feussner
    Nassir Navab
    International Journal of Computer Assisted Radiology and Surgery, 2008, 3 : 331 - 339
  • [37] Automatic feature generation in endoscopic images
    Klank, Ulrich
    Padoy, Nicolas
    Feussner, Hubertus
    Navab, Nassir
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2008, 3 (3-4) : 331 - 339
  • [38] Automatic text extraction in news images using morphology
    Jang, IY
    Ko, BC
    Byun, H
    Choi, YW
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2002, PTS 1 AND 2, 2002, 4671 : 521 - 530
  • [39] Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap
    Amirian, Soheyla
    Rasheed, Khaled
    Taha, Thiab R.
    Arabnia, Hamid R.
    IEEE ACCESS, 2020, 8 (08): : 218386 - 218400
  • [40] TVPRNN for image caption generation
    Yang, Liang
    Hu, Haifeng
    ELECTRONICS LETTERS, 2017, 53 (22) : 1471 - +