共 50 条
- [31] VIDEO QUESTION GENERATION VIA SEMANTIC RICH CROSS-MODAL SELF-ATTENTION NETWORKS LEARNING 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2423 - 2427
- [32] RGB-D Saliency Detection Based on Attention Mechanism and Multi-Scale Cross-Modal Fusion Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (06): : 893 - 902
- [37] Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams INTERSPEECH 2021, 2021, : 321 - 325
- [40] CCMA: CapsNet for audio-video sentiment analysis using cross-modal attention VISUAL COMPUTER, 2025, 41 (03): : 1609 - 1620