Multi-view pedestrian captioning with an attention topic CNN model

被引:7
|
作者
Liu, Quan [1 ,3 ,4 ]
Chen, Yingying [1 ,2 ]
Wang, Jinqiao [1 ,2 ]
Zhang, Sijiong [1 ,3 ,4 ]
机构
[1] Univ Chinese Acad Sci, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[3] Chinese Acad Sci, Nanjing Inst Astron Opt & Technol, Natl Astron Observ, Nanjing 210042, Jiangsu, Peoples R China
[4] Chinese Acad Sci, Nanjing Inst Astron Opt & Technol, Key Lab Astron Opt & Technol, Nanjing 210042, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Pedestrian description; Multi-view captions;
D O I
10.1016/j.compind.2018.01.015
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Image captioning is a fundamental task connecting computer vision and natural language processing. Recent researches usually concentrate on generic image captioning or video captioning among thousands of classes. However, they fail to cover detailed semantics and cannot effectively deal with a specific class of objects, such as pedestrian. Pedestrian captioning plays a critical role for analysis, identification and retrieval in massive collections of video data. Therefore, in this paper, we propose a novel approach to generate multi-view captions for pedestrian images with a topic attention mechanism on global and local semantic regions. Firstly, we detect different local parts of pedestrian and utilize a deep convolutional neural network (CNN) to extract a series of features from these local regions and the whole image. Then, we aggregate these features with a topic attention CNN model to produce a representative vector richly expressing the image from a different view at each time step. This feature vector is taken as input to a hierarchical recurrent neural network to generate multi-view captions for pedestrian images. Finally, a new dataset named CASIA_Pedestrian including 5000 pedestrian images and sentences pairs is collected to evaluate the performance of pedestrian captioning. Experiments and comparison results show the superiority of our proposed approach. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:47 / 53
页数:7
相关论文
共 50 条
  • [21] Multi-view stereo network with point attention
    Zhao, Rong
    Gu, Zhuoer
    Han, Xie
    He, Ligang
    Sun, Fusheng
    Jiao, Shichao
    APPLIED INTELLIGENCE, 2023, 53 (22) : 26622 - 26636
  • [22] Multi-view stereo network with point attention
    Rong Zhao
    Zhuoer Gu
    Xie Han
    Ligang He
    Fusheng Sun
    Shichao Jiao
    Applied Intelligence, 2023, 53 : 26622 - 26636
  • [23] Multi-view self-attention networks
    Xu, Mingzhou
    Yang, Baosong
    Wong, Derek F.
    Chao, Lidia S.
    KNOWLEDGE-BASED SYSTEMS, 2022, 241
  • [24] Attention-Aware Multi-View Stereo
    Luo, Keyang
    Guan, Tao
    Ju, Lili
    Wang, Yuesong
    Chen, Zhuo
    Luo, Yawei
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1587 - 1596
  • [25] Multi-View Attention Network for Visual Dialog
    Park, Sungjin
    Whang, Taesun
    Yoon, Yeochan
    Lim, Heuiseok
    APPLIED SCIENCES-BASEL, 2021, 11 (07):
  • [26] MGAT: Multi-view Graph Attention Networks
    Xie, Yu
    Zhang, Yuanqiao
    Gong, Maoguo
    Tang, Zedong
    Han, Chao
    NEURAL NETWORKS, 2020, 132 : 180 - 189
  • [27] Multi-view clustering based on view-attention driven
    Ma, Zhifeng
    Yu, Junyang
    Wang, Longge
    Chen, Huazhu
    Zhao, Yuxi
    He, Xin
    Wang, Yingqi
    Song, Yalin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (08) : 2621 - 2631
  • [28] Multi-view clustering based on view-attention driven
    Zhifeng Ma
    Junyang Yu
    Longge Wang
    Huazhu Chen
    Yuxi Zhao
    Xin He
    Yingqi Wang
    Yalin Song
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 2621 - 2631
  • [29] Multi-view multi-label learning with view feature attention allocation
    Cheng, Yusheng
    Li, Qingyan
    Wang, Yibin
    Zheng, Weijie
    NEUROCOMPUTING, 2022, 501 : 857 - 874
  • [30] A multi-view pedestrian tracking method in an uncalibrated camera network
    Varga, Domonkos
    Sziranyi, Tamas
    Kiss, Attila
    Sporas, Laszlo
    Havasi, Laszlo
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 184 - 191