共 50 条
- [1] Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality COMPUTER VISION - ECCV 2024, PT XV, 2025, 15073 : 42 - 59
- [2] AVQA: A Dataset for Audio-Visual Question Answering on Videos PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3480 - 3491
- [3] COCA: COllaborative CAusal Regularization for Audio-Visual Question Answering THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12995 - 13003
- [5] Object-Aware Adaptive-Positivity Learning for Audio-Visual Question Answering THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3306 - 3314
- [6] Progressive Spatio-temporal Perception for Audio-Visual Question Answering PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7808 - 7816
- [7] Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2011 - 2021
- [10] Leveraging Modality-Specific Representations for Audio-Visual Speech Recognition via Reinforcement Learning THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12607 - +