Visual Attentional Network and Learning Method for Object Search and Recognition

被引:0
|
作者
Lü J. [1 ]
Luo F. [1 ]
Yuan Z. [1 ]
机构
[1] School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an
关键词
Attentional model; Fixation strategy; Object detection; Reinforcement learning;
D O I
10.3901/JME.2019.11.123
中图分类号
学科分类号
摘要
A recurrent visual network is proposed to search and recognize an object simultaneously. The network can automatically select a sequence of local observations, and accurately localize and recognize objects by fusing those local detail appearance and rough context visual information. The method is more efficient than other methods with sliding windows or convolution on a whole image. Besides, a hybrid loss function is proposed to learn parameters of the multi-task network end-to-end. Especially, The combination of stochastic and object-awareness strategy is imported into visual fixation loss, which is beneficial to mine more abundant context and ensure fixation point close to object as fast as possible. A real-world dataset is built to verify the capacity of the method in searching and recognizing the object of interest including those small ones. Experiments illustrate that the method can predict an accurate bounding box for a visual object, and achieve higher searching speed. The source code will be opened to verify and analyze the method. © 2019 Journal of Mechanical Engineering.
引用
收藏
页码:123 / 130
页数:7
相关论文
共 27 条
  • [1] Viola P., Jones M.J., Robust real-time face detection, International Journal of Computer Vision, 57, 2, pp. 137-154, (2004)
  • [2] Felzenszwalb P.F., Girshick R.B., Mcallester D.A., Et al., Object detection with discriminatively trained part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 9, pp. 1627-1645, (2010)
  • [3] Sadeghi M.A., Forsyth D., 30Hz object detection with DPM V5, European Conference on Computer Vision (ECCV), pp. 65-79, (2014)
  • [4] Girshick R., Donahue J., Darrell T., Et al., Rich feature hierarchies for accurate object detection and semantic segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580-587, (2014)
  • [5] Girshick R., Fast R-CNN, 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440-1448, (2016)
  • [6] Ren S., He K., Girshick R., Et al., Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 6, pp. 1137-1149, (2015)
  • [7] Redmon J., Divvala S.K., Girshick R.B., Et al., You only look once: Unified, real-time object detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, (2016)
  • [8] Liu W., Anguelov D., Erhan D., Et al., SSD: Single shot multibox detector, European Conference on Computer Vision (ECCV), pp. 21-37, (2016)
  • [9] Schmidhuber J., Huber R., Learning to generate artificial FOVEA Trajectories for target detection, International Journal of Neural Systems, 2, pp. 125-134, (1991)
  • [10] Torralba A., Oliva A., Castelhano M.S., Et al., Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search, Psychological Review, 113, 4, pp. 766-786, (2006)