Visual Object Search by Learning Spatial Context

被引：52

作者：

Druon, Raphael ^{[1
,2
]}

Yoshiyasu, Yusuke ^{[2
,3
]}

Kanezaki, Asako ^{[3
]}

Watt, Alassane ^{[2
,4
]}

机构：

[1] Paul Sabatier Univ, F-31330 Toulouse, France

[2] CNRS AIST Joint Robot Lab, Tsukuba, Ibaraki 3058560, Japan

[3] Natl Inst Adv Ind Sci & Technol, Tokyo 1350064, Japan

[4] Cent Supelec, Rennes, France

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2020年 / 5卷 / 02期

关键词：

Deep learning in robotics and automation; visual-based navigation; autonomous agents; OBSTACLE AVOIDANCE;

D O I：

10.1109/LRA.2020.2967677

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

We present a visual navigation approach that uses context information to navigate an agent to find and reach a target object. To learn context from the objects present in the scene, we transform visual information into an intermediate representation called context grid which essentially represents how much the object at the location is semantically similar to the target object. As this representation can encode the target object and other objects together, it allows us to navigate an agent in a human-inspired way: the agent will go to the likely place by seeing surrounding context objects in the beginning when the target is not visible and, once the target object comes into sight, it will reach the target quickly. Since context grid does not directly contain visual or semantic feature values that change according to introductions of new objects, such as new instances of the same object with different appearance or an object from a slightly different class, our navigation model generalizes well to unseen scenes/objects. Experimental results show that our approach outperforms previous approaches in navigating in unseen scenes, especially for broad scenes. We also evaluated human performances in the target-driven navigation task and compared with machine learning based navigation approaches including this work.

引用

页码：1279 / 1286

页数：8

共 50 条

[11] Top - down strategy affects learning of visual context in visual search
Endo, N.
PERCEPTION, 2008, 37 : 8 - 8
[12] Visual search is relational without prior context learning
Becker, Stefanie I.
Hamblin-Frohman, Zachary
Amarasekera, Koralalage Don Raveen
COGNITION, 2025, 260
[13] Learning Spatial Fusion and Matching for Visual Object Tracking
Xiao, Wei
Zhang, Zili
PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2022, 13631 : 352 - 367
[14] Spatial context and top-down strategies in visual search
Lleras, A
Von Mühlenen, A
SPATIAL VISION, 2004, 17 (4-5): : 465 - 482
[15] Visual Attentional Network and Learning Method for Object Search and Recognition
Lü J.
Luo F.
Yuan Z.
Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2019, 55 (11): : 123 - 130
[16] Learning by selection: Visual search and object perception in young infants
Amso, Dima
Johnson, Scott P.
DEVELOPMENTAL PSYCHOLOGY, 2006, 42 (06) : 1236 - 1245
[17] A model of spatial and object-based attention for active visual search
Lanyon, L
Denham, S
MODELING LANGUAGE, COGNITION AND ACTION, 2005, 16 : 239 - 248
[18] Object and Spatial Context Representations in Visual Short-Term Memory
Li, Aedan Y.
ENEURO, 2021, 8 (02)
[19] Oculomotor correlates of context-guided learning in visual search
Tseng, YC
Li, CSR
PERCEPTION & PSYCHOPHYSICS, 2004, 66 (08): : 1363 - 1378
[20] Oculomotor correlates of context-guided learning in visual search
Yuan-Chi Tseng
Chiang-Shan Ray Li
Perception & Psychophysics, 2004, 66 : 1363 - 1378

← 1 2 3 4 5 →