共 50 条
- [42] Hierarchical Multi-Task Learning for Diagram Question Answering with Multi-Modal Transformer PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1313 - 1321
- [44] Dual Path Multi-Modal High-Order Features for Textual Content based Visual Question Answering 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4324 - 4331
- [45] Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
- [46] Text-Guided Object Detector for Multi-modal Video Question Answering 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1032 - 1042
- [47] Open-Ended Multi-Modal Relational Reasoning for Video Question Answering 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 363 - 369
- [48] MM-Reasoner: A Multi-Modal Knowledge-Aware Framework for Knowledge-Based Visual Question Answering FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6571 - 6581
- [50] ESSAY-ANCHOR ATTENTIVE MULTI-MODAL BILINEAR POOLING FOR TEXTBOOK QUESTION ANSWERING 2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,