Visual Attentional Network and Learning Method for Object Search and Recognition

被引:0
|
作者
Lü J. [1 ]
Luo F. [1 ]
Yuan Z. [1 ]
机构
[1] School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an
来源
Jixie Gongcheng Xuebao/Journal of Mechanical Engineering | 2019年 / 55卷 / 11期
关键词
Attentional model; Fixation strategy; Object detection; Reinforcement learning;
D O I
10.3901/JME.2019.11.123
中图分类号
学科分类号
摘要
A recurrent visual network is proposed to search and recognize an object simultaneously. The network can automatically select a sequence of local observations, and accurately localize and recognize objects by fusing those local detail appearance and rough context visual information. The method is more efficient than other methods with sliding windows or convolution on a whole image. Besides, a hybrid loss function is proposed to learn parameters of the multi-task network end-to-end. Especially, The combination of stochastic and object-awareness strategy is imported into visual fixation loss, which is beneficial to mine more abundant context and ensure fixation point close to object as fast as possible. A real-world dataset is built to verify the capacity of the method in searching and recognizing the object of interest including those small ones. Experiments illustrate that the method can predict an accurate bounding box for a visual object, and achieve higher searching speed. The source code will be opened to verify and analyze the method. © 2019 Journal of Mechanical Engineering.
引用
收藏
页码:123 / 130
页数:7
相关论文
共 27 条
  • [11] Oliva A., Torralba A., The role of context in object recognition, Trends in Cognitive Sciences, 11, 12, pp. 1-527, (2007)
  • [12] Bell S., Zitnick C.L., Bala K., Et al., Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2874-2883, (2016)
  • [13] He K., Zhang X., Ren S., Et al., Spatial pyramid pooling in deep convolutional networks for visual recognition, European Conference on Computer Vision (ECCV), pp. 346-361, (2014)
  • [14] Larochelle H., Hinton G.E., Learning to combine foveal glimpses with a third-order Boltzmann machine, Advances in Neural Information Processing Systems (NIPS), pp. 1243-1251, (2010)
  • [15] Tang Y., Salakhutdinov R., Learning Stochastic feedforward neural networks, Advances in Neural Information Processing Systems, 1, pp. 530-538, (2013)
  • [16] Rezende D.J., Mohamed S., Wierstra D., Et al., Stochastic Backpropagation and Approximate Inference in Deep Generative models, International Conference on Machine Learning, pp. 1278-1286, (2014)
  • [17] Mnih V., Heess N., Graves A., Et al., Recurrent models of visual attention, Advances in Neural Information Processing Systems, 1, pp. 2204-2212, (2014)
  • [18] Graves A., Wayne G., Reynolds M., Et al., Hybrid computing using a neural network with dynamic external memory, Nature, 538, 7626, pp. 471-476, (2016)
  • [19] Ranzato M., On learning where to look, 1, (2014)
  • [20] Denil M., Bazzani L., Larochelle H., Et al., Learning where to attend with deep architectures for image tracking, Neural Computation, 24, 8, pp. 2151-2184, (2012)