Interpreting Natural Language Instructions Using Language, Vision, and Behavior

被引：3

作者：

Benotti, Luciana ^{[1
,2
]}

Lau, Tessa ^{[3
]}

Villalba, Martin ^{[1
,4
]}

机构：

[1] Univ Nacl Cordoba, Cordoba, Argentina

[2] Consejo Nacl Invest Cient & Tecn, Buenos Aires, DF, Argentina

[3] Savioke Inc, Sunnyvale, CA USA

[4] Univ Potsdam, D-14476 Potsdam, Germany

来源：

ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS | 2014年 / 4卷 / 03期

关键词：

Design; Algorithms; Performance; Natural language interpretation; multimodal understanding; action recognition; visual feedback; situated virtual agent; unsupervised learning;

D O I：

10.1145/2629632

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.

引用

页数：22

共 50 条

[21] Connecting Language and Vision for Natural Language-Based Vehicle Retrieval
Bai, Shuai
Zheng, Zhedong
Wang, Xiaohan
Lin, Junyang
Zhang, Zhu
Zhou, Chang
Yang, Hongxia
Yang, Yi
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 4029 - 4038
[22] Interpreting vision and language generative models with semantic visual priors
Cafagna, Michele
Rojas-Barahona, Lina M.
van Deemter, Kees
Gatt, Albert
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
[23] NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Sammani, Fawaz
Mukherjee, Tanmoy
Deligiannis, Nikos
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8312 - 8322
[24] Natural behavior is the language of the brain
Miller, Cory T.
Gire, David
Hoke, Kim
Huk, Alexander C.
Kelley, Darcy
Leopold, David A.
Smear, Matthew C.
Theunissen, Frederic
Yartsev, Michael
Niell, Cristopher M.
CURRENT BIOLOGY, 2022, 32 (10) : R482 - R493
[25] Interpreting natural language descriptions of the topological relations of enclaves
Wang, Xiaonan
Zhang, Xiuyuan
JOURNAL OF GEOGRAPHICAL SYSTEMS, 2025, : 301 - 335
[26] EFFECTS OF INSTRUCTIONS AND REINFORCEMENT ON THINKING AND LANGUAGE BEHAVIOR OF SCHIZOPHRENICS
MEICHENB.DH
BEHAVIOUR RESEARCH AND THERAPY, 1969, 7 (01) : 101 - &
[27] Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera
Bao, Jiatong
Jia, Yunyi
Cheng, Yu
Tang, Hongru
Xi, Ning
SENSORS, 2016, 16 (12):
[28] Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight
Blukis, Valts
Terme, Yannick
Niklasson, Eyvind
Knepper, Ross A.
Artzi, Yoav
CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
[29] Natural language texts for a cognitive vision system
Arens, M
Ottlik, A
Nagel, HH
ECAI 2002: 15TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, 77 : 455 - 459
[30] Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models
Iki, Taichi
Aizawa, Akiko
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2189 - 2196

← 1 2 3 4 5 →