共 50 条
- [2] Stacked cross-modal feature consolidation attention networks for image captioning Multimedia Tools and Applications, 2024, 83 : 12209 - 12233
- [4] Learning Cross-modal Representations with Multi-relations for Image Captioning PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 346 - 353
- [8] Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3864 - 3872
- [10] XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 479 - 489