Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context

被引:0
|
作者
Paul, Rohan [1 ]
Barbu, Andrei [1 ]
Felshin, Sue [1 ]
Katz, Boris [1 ]
Roy, Nicholas [1 ]
机构
[1] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
FRAMEWORK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A robot's ability to understand or ground natural language instructions is fundamentally tied to its knowledge about the surrounding world. We present an approach to grounding natural language utterances in the context of factual information gathered through natural-language interactions and past visual observations. A probabilistic model estimates, from a natural language utterance, the objects, relations, and actions that the utterance refers to, the objectives for future robotic actions it implies, and generates a plan to execute those actions while updating a state representation to include newly acquired knowledge from the visual-linguistic context. Grounding a command necessitates a representation for past observations and interactions; however, maintaining the full context consisting of all possible observed objects, attributes, spatial relations, actions, etc., over time is intractable. Instead, our model, Temporal Grounding Graphs, maintains a learned state representation for a belief over factual groundings, those derived from natural-language interactions, and lazily infers new groundings from visual observations using the context implied by the utterance. This work significantly expands the range of language that a robot can understand by incorporating factual knowledge and observations of its workspace in its inference about the meaning and grounding of natural-language utterances.
引用
收藏
页码:4506 / 4514
页数:9
相关论文
共 14 条
  • [1] Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
    Yang, Li
    Xu, Yan
    Yuan, Chunfeng
    Liu, Wei
    Li, Bing
    Hu, Weiming
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9489 - 9498
  • [2] STVGBert: A Visual-linguistic Transformer based Framework for Spatio-temporal Video Grounding
    Su, Rui
    Yu, Qian
    Xu, Dong
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1513 - 1522
  • [3] Weakly-Supervised Grounding for VQA with Dual Visual-Linguistic Interaction
    Liu, Yi
    Pan, Junwen
    Wang, Qilong
    Chen, Guanlin
    Nie, Weiguo
    Zhang, Yudong
    Gao, Qian
    Hu, Qinghua
    Zhu, Pengfei
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 156 - 169
  • [4] Lgvc: language-guided visual context modeling for 3D visual grounding
    Geng L.
    Yin J.
    Niu Y.
    Neural Computing and Applications, 2024, 36 (21) : 12977 - 12990
  • [5] Illustrative Language Understanding: Large-Scale Visual Grounding with Image Search
    Kiros, Jamie Ryan
    Chan, William
    Hinton, Geoffrey E.
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 922 - 933
  • [6] Language-Guided Multi-Granularity Context Aggregation for Temporal Sentence Grounding
    Gong, Guoqiang
    Zhu, Linchao
    Mu, Yadong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7402 - 7414
  • [7] Towards Visual Storytelling by Understanding Narrative Context Through Scene-Graphs
    Phueaksri, Itthisak
    Kastner, Marc A.
    Kawanishi, Yasutomo
    Komamizu, Takahiro
    Ide, Ichiro
    MULTIMEDIA MODELING, MMM 2025, PT IV, 2025, 15523 : 226 - 239
  • [8] Towards situated speech understanding: visual context priming of language models
    Roy, D
    Mukherjee, N
    COMPUTER SPEECH AND LANGUAGE, 2005, 19 (02): : 227 - 248
  • [9] EFFECTS OF RESPONSE LANGUAGE AND STIMULUS CONTEXT UPON JUDGMENTS OF VISUAL AND TEMPORAL EXTENT
    HICKS, RE
    SCHROEDER, SR
    GUALTIERI, CT
    MAYO, JP
    AMERICAN JOURNAL OF PSYCHOLOGY, 1983, 96 (03): : 365 - 375
  • [10] Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding
    Zhang, Zhihan
    Cao, Yixin
    Ye, Chenchen
    Ma, Yunshan
    Liao, Lizi
    Chua, Tat-Seng
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1588 - 1606