共 50 条
- [32] Audio-visual Speaker Recognition via Multi-modal Correlated Neural Networks 2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE WORKSHOPS (WIW 2016), 2016, : 123 - 128
- [33] Enhancing Visual Question Answering through Bi-Modal Feature Fusion: Performance Analysis 6TH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MACHINE VISION, IPMV 2024, 2024, : 115 - 122
- [34] Cross-Modal Label Contrastive Learning for Unsupervised Audio-Visual Event Localization THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 215 - 222
- [35] Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation INTERSPEECH 2022, 2022, : 886 - 890
- [36] Temporal and Cross-modal Attention for Audio-Visual Zero-Shot Learning COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 488 - 505
- [37] Transfer Learning via Unsupervised Task Discovery for Visual Question Answering 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8377 - 8386
- [40] Jointly Learning Attentions with Semantic Cross-Modal Correlation for Visual Question Answering DATABASES THEORY AND APPLICATIONS, ADC 2017, 2017, 10538 : 248 - 260