共 64 条
- [11] Fei Z C., Better understanding hierarchical visual relationship for image caption, (2019)
- [12] Lee K H, Palangi H, Chen X, Et al., Learning visual relation priors for image-text matching and image captioning with neural scene graph generators, (2019)
- [13] Yao T, Pan Y W, Li Y H, Et al., Hierarchy parsing for image captioning, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2621-2629, (2019)
- [14] He S, Liao W T, Tavakoli H R, Et al., Image captioning through image transformer, Computer Vision-ACCV2020, pp. 153-169, (2021)
- [15] Zhang H B, Jiang Z L, Xiong Q P, Et al., Image attribute annotation via a modified effective range based gene selection and cross-modal semantics mining, Acta Electronica Sinica, 48, 4, pp. 790-799, (2020)
- [16] Chen F H, Ji R R, Sun X S, Et al., GroupCap: group-based image captioning with structured relevance and diversity constraints, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1345-1353, (2018)
- [17] Pasunuru R, Bansal M., Multi-task video captioning with video and entailment generation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1273-1283, (2017)
- [18] Zhou L W, Palangi H, Zhang L, Et al., Unified vision-language pre-training for image captioning and VQA, Proceedings of the AAAI Conference on Artificial Intelligence, 34, 7, pp. 13041-13049, (2020)
- [19] Wang Y F, Lin Z, Shen X H, Et al., Skeleton key: Image captioning by skeleton-attribute decomposition, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7378-7387, (2017)
- [20] Lu J S, Yang J W, Batra D, Et al., Neural baby talk, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7219-7228, (2018)