共 50 条
- [41] Remember and forget: video and text fusion for video question answering Multimedia Tools and Applications, 2018, 77 : 29269 - 29282
- [42] Multimodal Graph Networks for Compositional Generalization in Visual Question Answering ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
- [43] Compositional Task-Oriented Parsing as Abstractive Question Answering NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4418 - 4427
- [44] Grounded Graph Decoding Improves Compositional Generalization in Question Answering FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1829 - 1838
- [45] Video question answering via traffic knowledge database and question classification Multimedia Systems, 2024, 30
- [47] Question Difficulty Estimation with Directional Modality Association in Video Question Answering ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND PRACTICES IN ARTIFICIAL INTELLIGENCE, 2022, 13343 : 287 - 299
- [48] Learning Question-Guided Video Representation for Multi-Turn Video Question Answering 20TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2019), 2019, : 215 - 225
- [49] ViLA: Efficient Video-Language Alignment for Video Question Answering COMPUTER VISION - ECCV 2024, PT LXII, 2025, 15120 : 186 - 204
- [50] Knowledge Proxy Intervention for Deconfounded Video Question Answering 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2770 - 2781