Audio-Visual Fusion for Sound Source Localization and Improved Attention

被引:0
|
作者
Lee, Byoung-gi [1 ]
Choi, JongSuk [1 ]
Yoon, SangSuk [2 ]
Choi, Mun-Taek [2 ]
Kim, Munsang [2 ]
Kim, Daijin [3 ]
机构
[1] Korea Inst Sci & Technol, Ctr Cognit Robot Res, Seoul, South Korea
[2] Korea Inst Sci & Technol, Ctr Intelligent Robot, Seoul, South Korea
[3] Postech, Dept Comp Sci & Engn, Pohang, South Korea
关键词
Audio-Vision Fusion; Sound Source Localization; Human Attention; Robot Tracking;
D O I
10.3795/KSME-A.2011.35.7.737
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. Audiovisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audiovision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection.
引用
收藏
页码:737 / 743
页数:7
相关论文
共 50 条
  • [21] Distinctive feature fusion for improved audio-visual phoneme recognition
    Lewis, T
    Powers, D
    ISSPA 2005: THE 8TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2005, : 62 - 65
  • [22] Self-supervised Neural Audio-Visual Sound Source Localization via Probabilistic Spatial Modeling
    Masuyama, Yoshiki
    Bando, Yoshiaki
    Yatabe, Kohei
    Sasaki, Yoko
    Onishi, Masaki
    Oikawa, Yasuhiro
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 4848 - 4854
  • [23] A JOINT AUDIO-VISUAL APPROACH TO AUDIO LOCALIZATION
    Jensen, Jesper Rindom
    Christensen, Mads Graesboll
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 454 - 458
  • [24] Masked co-attention model for audio-visual event localization
    Liu, Hengwei
    Gu, Xiaodong
    APPLIED INTELLIGENCE, 2024, 54 (02) : 1691 - 1705
  • [25] Temporal Cross-Modal Attention for Audio-Visual Event Localization
    Nagasaki Y.
    Hayashi M.
    Kaneko N.
    Aoki Y.
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2022, 88 (03): : 263 - 268
  • [26] Masked co-attention model for audio-visual event localization
    Hengwei Liu
    Xiaodong Gu
    Applied Intelligence, 2024, 54 : 1691 - 1705
  • [27] Audio-Visual Fusion With Temporal Convolutional Attention Network for Speech Separation
    Liu, Debang
    Zhang, Tianqi
    Christensen, Mads Graesboll
    Yi, Chen
    An, Zeliang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4647 - 4660
  • [28] Visually Guided Sound Source Separation With Audio-Visual Predictive Coding
    Song, Zengjie
    Zhang, Zhaoxiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15528 - 15542
  • [29] Visually Guided Sound Source Separation With Audio-Visual Predictive Coding
    Song, Zengjie
    Zhang, Zhaoxiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15528 - 15542
  • [30] Audio-visual speech processing and attention
    Sams, M
    PSYCHOPHYSIOLOGY, 2003, 40 : S5 - S6