共 50 条
- [44] GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 167 - 184
- [48] RESTHT: relation-enhanced spatial-temporal hierarchical transformer for video captioning VISUAL COMPUTER, 2025, 41 (01): : 591 - 604
- [50] Modeling Context-Guided Visual and Linguistic Semantic Feature for Video Captioning ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 677 - 689