共 50 条
- [1] Stacked cross-modal feature consolidation attention networks for image captioning Multimedia Tools and Applications, 2024, 83 : 12209 - 12233
- [2] HCNet: Hierarchical Feature Aggregation and Cross-Modal Feature Alignment for Remote Sensing Image Captioning IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 11
- [3] Exploring and Distilling Cross-Modal Information for Image Captioning PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5095 - 5101
- [5] Cross-modal recipe retrieval with stacked attention model Multimedia Tools and Applications, 2018, 77 : 29457 - 29473
- [10] Learning Cross-modal Representations with Multi-relations for Image Captioning PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 346 - 353