Large Scale Retrieval and Generation of Image Descriptions

被引:48
|
作者
Ordonez, Vicente [1 ]
Han, Xufeng [1 ]
Kuznetsova, Polina [2 ]
Kulkarni, Girish [2 ]
Mitchell, Margaret [3 ]
Yamaguchi, Kota [4 ]
Stratos, Karl [5 ]
Goyal, Amit [6 ]
Dodge, Jesse [7 ]
Mensch, Alyssa [8 ]
Daume, Hal, III [9 ]
Berg, Alexander C. [1 ]
Choi, Yejin [10 ]
Berg, Tamara L. [1 ]
机构
[1] Univ N Carolina, Chapel Hill, NC 27599 USA
[2] SUNY Stony Brook, Stony Brook, NY 11794 USA
[3] Microsoft Res, Redmond, WA USA
[4] Tohoku Univ, Sendai, Miyagi, Japan
[5] Columbia Univ, New York, NY USA
[6] Yahoo Labs, Sunnyvale, CA USA
[7] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[8] Univ Penn, Philadelphia, PA 19104 USA
[9] Univ Maryland, College Pk, MD 20742 USA
[10] Univ Washington, Seattle, WA 98195 USA
关键词
Retrieval; Image description; Data driven; Big data; Natural language processing; SCENE;
D O I
10.1007/s11263-015-0840-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
What is the story of an image? What is the relationship between pictures, language, and information we can extract using state of the art computational recognition systems? In an attempt to address both of these questions, we explore methods for retrieving and generating natural language descriptions for images. Ideally, we would like our generated textual descriptions (captions) to both sound like a person wrote them, and also remain true to the image content. To do this we develop data-driven approaches for image description generation, using retrieval-based techniques to gather either: (a) whole captions associated with a visually similar image, or (b) relevant bits of text (phrases) from a large collection of image + description pairs. In the case of (b), we develop optimization algorithms to merge the retrieved phrases into valid natural language sentences. The end result is two simple, but effective, methods for harnessing the power of big data to produce image captions that are altogether more general, relevant, and human-like than previous attempts.
引用
收藏
页码:46 / 59
页数:14
相关论文
共 50 条
  • [21] FOREST HASHING: EXPEDITING LARGE SCALE IMAGE RETRIEVAL
    Springer, Jonathan
    Xin, Xin
    Li, Zhu
    Watt, Jeremy
    Katsaggelos, Aggelos
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 1681 - 1684
  • [22] Deep binary codes for large scale image retrieval
    Wu, Song
    Oerlemans, Ard
    Bakker, Erwin M.
    Lew, Michael S.
    NEUROCOMPUTING, 2017, 257 : 5 - 15
  • [23] Deep Hashing for Large-scale Image Retrieval
    Li Mengting
    Liu Jun
    PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 10940 - 10944
  • [24] Fusing local image descriptors for large-scale image retrieval
    Hoerster, Eva
    Lienhart, Rainer
    2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 3664 - +
  • [25] Image Retrieval from Contextual Descriptions
    Krojer, Benno
    Adlakha, Vaibhav
    Vineet, Vibhav
    Goyal, Yash
    Ponti, Edoardo
    Reddy, Siva
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3426 - 3440
  • [26] A Lightweight Framework for Fast Image Retrieval on Large-Scale Image Datasets
    Chen, Renhai
    Li, Wenwen
    Rao, Guozheng
    Feng, Zhiyong
    2020 9TH IEEE NON-VOLATILE MEMORY SYSTEMS AND APPLICATIONS SYMPOSIUM (NVMSA 2020), 2020, : 42 - 47
  • [27] Topic modeling and improvement of image representation for large-scale image retrieval
    Nguyen Anh Tu
    Dong-Luong Dinh
    Rasel, Mostofa Kamal
    Lee, Young-Koo
    INFORMATION SCIENCES, 2016, 366 : 99 - 120
  • [28] Large-scale Image Retrieval with Sparse Binary Projections
    Ma, Changyi
    Gu, Chonglin
    Li, Wenye
    Cui, Shuguang
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1817 - 1820
  • [29] Neighborhood Discriminant Hashing for Large-Scale Image Retrieval
    Tang, Jinhui
    Li, Zechao
    Wang, Meng
    Zhao, Ruizhen
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (09) : 2827 - 2840
  • [30] Coupled Binary Embedding for Large-Scale Image Retrieval
    Zheng, Liang
    Wang, Shengjin
    Tian, Qi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (08) : 3368 - 3380