Audio-Visual Fusion for Sound Source Localization and Improved Attention

被引:0
|
作者
Lee, Byoung-gi [1 ]
Choi, JongSuk [1 ]
Yoon, SangSuk [2 ]
Choi, Mun-Taek [2 ]
Kim, Munsang [2 ]
Kim, Daijin [3 ]
机构
[1] Korea Inst Sci & Technol, Ctr Cognit Robot Res, Seoul, South Korea
[2] Korea Inst Sci & Technol, Ctr Intelligent Robot, Seoul, South Korea
[3] Postech, Dept Comp Sci & Engn, Pohang, South Korea
关键词
Audio-Vision Fusion; Sound Source Localization; Human Attention; Robot Tracking;
D O I
10.3795/KSME-A.2011.35.7.737
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. Audiovisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audiovision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection.
引用
收藏
页码:737 / 743
页数:7
相关论文
共 50 条
  • [11] Real-time sound source localization and separation based on active audio-visual integration
    Okuno, HG
    Nakadai, K
    COMPUTATIONAL METHODS IN NEURAL MODELING, PT 1, 2003, 2686 : 118 - 125
  • [12] Audio-Visual Sound Source Localization and Tracking Based on Mobile Robot for The Cocktail Party Problem
    Shi, Zhanbo
    Zhang, Lin
    Wang, Dongqing
    APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [13] Audio-Visual Fusion Based on Interactive Attention for Person Verification
    Jing, Xuebin
    He, Liang
    Song, Zhida
    Wang, Shaolei
    SENSORS, 2023, 23 (24)
  • [14] Attention-Based Audio-Visual Fusion for Video Summarization
    Fang, Yinghong
    Zhang, Junpeng
    Lu, Cewu
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT II, 2019, 11954 : 328 - 340
  • [15] Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
    Wang, Yaoting
    Liu, Weisong
    Li, Guangyao
    Ding, Jian
    Hu, Di
    Li, Xi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5669 - 5677
  • [16] Paper: Speaker Localization Based on Audio-Visual Bimodal Fusion
    Zhu, Ying-Xin
    Jin, Hao-Ran
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2021, 25 (03) : 375 - 382
  • [17] Binaural Audio-Visual Localization
    Wu, Xinyi
    Wu, Zhenyao
    Ju, Lili
    Wang, Song
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2961 - 2968
  • [18] Information-Driven Active Audio-Visual Source Localization
    Schult, Niclas
    Reineking, Thomas
    Kluss, Thorsten
    Zetzsche, Christoph
    PLOS ONE, 2015, 10 (09):
  • [19] Onmidirectional audio-visual talker localization based on dynamic fusion of audio-visual features using validity and reliability criteria
    Denda, Yuki
    Nishiura, Takanobu
    Yamashita, Yoichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03): : 598 - 606
  • [20] Joint Learning of Audio-Visual Saliency Prediction and Sound Source Localization on Multi-face Videos
    Qiao, Minglang
    Liu, Yufan
    Xu, Mai
    Deng, Xin
    Li, Bing
    Hu, Weiming
    Borji, Ali
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (06) : 2003 - 2025