Automatic Caption Generation for News Images

被引:61
|
作者
Feng, Yansong [1 ]
Lapata, Mirella [2 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, 128 Zhong Guan Cun N St, Beijing 100871, Peoples R China
[2] Univ Edinburgh, Informat Forum, Inst Language Cognit & Computat, Sch Informat, Edinburgh EH8 9AB, Midlothian, Scotland
关键词
Caption generation; image annotation; summarization; topic models; NATURAL-LANGUAGE;
D O I
10.1109/TPAMI.2012.118
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper is concerned with the task of automatically generating captions for images, which is important for many image-related applications. Examples include video and image retrieval as well as the development of tools that aid visually impaired individuals to access pictorial information. Our approach leverages the vast resource of pictures available on the web and the fact that many of them are captioned and colocated with thematically related documents. Our model learns to create captions from a database of news articles, the pictures embedded in them, and their captions, and consists of two stages. Content selection identifies what the image and accompanying article are about, whereas surface realization determines how to verbalize the chosen content. We approximate content selection with a probabilistic image annotation model that suggests keywords for an image. The model postulates that images and their textual descriptions are generated by a shared set of latent variables (topics) and is trained on a weakly labeled dataset (which treats the captions and associated news articles as image labels). Inspired by recent work in summarization, we propose extractive and abstractive surface realization models. Experimental results show that it is viable to generate captions that are pertinent to the specific content of an image and its associated article, while permitting creativity in the description. Indeed, the output of our abstractive model compares favorably to handwritten captions and is often superior to extractive methods.
引用
收藏
页码:797 / 812
页数:16
相关论文
共 50 条
  • [21] Caption Generation From Road Images for Traffic Scene Modeling
    Li, Yaochen
    Wu, Chuan
    Li, Ling
    Liu, Yuehu
    Zhu, Jihua
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) : 7805 - 7816
  • [22] Caption Generation from Road Images for Traffic Scene Construction
    Wu, Chuan
    Li, Yaochen
    Li, Ling
    Wang, Le
    Liu, Yuehu
    2020 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2020, : 1271 - 1276
  • [23] Automatic image caption generation using deep learning and multimodal attention
    Dai, Jin
    Zhang, Xinyu
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [24] Automatic Image Caption Generation Based on Some Machine Learning Algorithms
    Predic, Bratislav
    Manic, Dasa
    Saracevic, Muzafer
    Karabasevic, Darjan
    Stanujkic, Dragisa
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [25] Automatic Image Caption Generation Based on Some Machine Learning Algorithms
    Predic, Bratislav
    Manic, Dasa
    Saracevic, Muzafer
    Karabasevic, Darjan
    Stanujkic, Dragisa
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [26] GVA: guided visual attention approach for automatic image caption generation
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Hossain, Md. Imran
    MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [27] GVA: guided visual attention approach for automatic image caption generation
    Md. Bipul Hossen
    Zhongfu Ye
    Amr Abdussalam
    Md. Imran Hossain
    Multimedia Systems, 2024, 30
  • [28] Automatic classification and skimming of articles in a news video using Korean closed-caption
    Cho, JW
    Jeong, SD
    Choi, BU
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 498 - 501
  • [29] Automatic Headline Generation for News Article
    Rajalakshmy, K. R.
    Remya, P. C.
    COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 1, CIDM 2015, 2016, 410 : 357 - 366
  • [30] Language of Gleam: Impressionism Artwork Automatic Caption Generation for People with Visual Impairments
    Lee, Dongmyeong
    Hwang, Hyegyeong
    Jabbar, Muhammad Shahid
    Cho, Jun-Dong
    THIRTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2020), 2021, 11605