Visual-Haptic-Kinesthetic Object Recognition with Multimodal Transformer

被引:1
|
作者
Zhou, Xinyuan [1 ]
Lan, Shiyong [1 ]
Wa, Wenwu [2 ]
Li, Xinyang [1 ]
Zhou, Siyuan [1 ]
Yang, Hongyu [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Univ Surrey, Guildford GU2 7XH, Surrey, England
关键词
Object Recognition; Multimodal Deep Learning; Multimodal Fusion; Attention Mechanism; TACTILE FUSION; NETWORK;
D O I
10.1007/978-3-031-44195-0_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans recognize objects by combining multi-sensory information in a coordinated fashion. However, visual-based and haptic-based object recognition remain two separate research directions in robotics. Visual images and haptic time series have different properties, which can be difficult for robots to fuse for object recognition as humans do. In this work, we propose an architecture to fuse visual, haptic and kinesthetic data for object recognition, based on the multimodal Convolutional Recurrent Neural Networks with Transformer. We use Convolutional Neural Networks (CNNs) to learn spatial representation, Recurrent Neural Networks (RNNs) to model temporal relationships, and Transformer's self-attention and cross-attention structures to focus on global and cross-modal information. We propose two fusion methods and conduct experiments on the multimodal AU dataset. The results show that our model offers higher accuracy than the latest multimodal object recognition methods. We conduct an ablation study on the individual components of the inputs to demonstrate the importance of multimodal information in object recognition. The codes will be available at https://github.com/SYLan2019/VHKOR.
引用
收藏
页码:233 / 245
页数:13
相关论文
共 50 条
  • [1] Viewpoint dependence in visual and haptic object recognition
    Newell, FN
    Ernst, MO
    Tjan, BS
    Bülthoff, HH
    PSYCHOLOGICAL SCIENCE, 2001, 12 (01) : 37 - 42
  • [2] Haptic object recognition based on shape relates to visual object recognition ability
    Chow, Jason K.
    Palmeri, Thomas J.
    Gauthier, Isabel
    PSYCHOLOGICAL RESEARCH-PSYCHOLOGISCHE FORSCHUNG, 2022, 86 (04): : 1262 - 1273
  • [3] Haptic object recognition based on shape relates to visual object recognition ability
    Jason K. Chow
    Thomas J. Palmeri
    Isabel Gauthier
    Psychological Research, 2022, 86 : 1262 - 1273
  • [4] Size-sensitivity in visual and haptic object recognition
    Craddock, M.
    Lawson, R.
    PERCEPTION, 2009, 38 : 160 - 160
  • [5] Distinct but related abilities for visual and haptic object recognition
    Chow, Jason K.
    Palmeri, Thomas J.
    Gauthier, Isabel
    PSYCHONOMIC BULLETIN & REVIEW, 2024, 31 (05) : 2148 - 2159
  • [6] Multimodal Object Recognition from Visual and Audio Sequences
    He, Weipeng
    Guan, Haojun
    Zhang, Jianwei
    2015 IEEE INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS (MFI), 2015, : 133 - 138
  • [7] Simple kinesthetic haptics for object recognition
    Sintov, Avishai
    Meir, Inbar
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2023, 42 (07): : 537 - 561
  • [8] VISUAL RECOGNITION THROUGH KINESTHETIC MEDIATION
    LANDIS, T
    GRAVES, R
    BENSON, DF
    HEBBEN, N
    PSYCHOLOGICAL MEDICINE, 1982, 12 (03) : 515 - 531
  • [9] Multimodal Sparse Transformer Network for Audio-Visual Speech Recognition
    Song, Qiya
    Sun, Bin
    Li, Shutao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10028 - 10038
  • [10] The achievement of object constancy across depth rotation for unimodal and crossmodal visual and haptic object recognition
    Lawson, R.
    Buelthoff, H. H.
    PERCEPTION, 2008, 37 : 6 - 6