共 14 条
- [1] Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9489 - 9498
- [2] STVGBert: A Visual-linguistic Transformer based Framework for Spatio-temporal Video Grounding 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1513 - 1522
- [3] Weakly-Supervised Grounding for VQA with Dual Visual-Linguistic Interaction ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 156 - 169
- [5] Illustrative Language Understanding: Large-Scale Visual Grounding with Image Search PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 922 - 933
- [7] Towards Visual Storytelling by Understanding Narrative Context Through Scene-Graphs MULTIMEDIA MODELING, MMM 2025, PT IV, 2025, 15523 : 226 - 239
- [8] Towards situated speech understanding: visual context priming of language models COMPUTER SPEECH AND LANGUAGE, 2005, 19 (02): : 227 - 248
- [9] EFFECTS OF RESPONSE LANGUAGE AND STIMULUS CONTEXT UPON JUDGMENTS OF VISUAL AND TEMPORAL EXTENT AMERICAN JOURNAL OF PSYCHOLOGY, 1983, 96 (03): : 365 - 375
- [10] Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1588 - 1606