Multi-View Learning of Acoustic Features for Speaker Recognition

被引:11
|
作者
Livescu, Karen [1 ]
Stoehr, Mark [2 ]
机构
[1] TTI Chicago, Chicago, IL 60637 USA
[2] Univ Chicago, Chicago, IL 60637 USA
关键词
D O I
10.1109/ASRU.2009.5373462
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider learning acoustic feature transformations using an additional view of the data, in this case video of the speaker's face. Specifically, we consider a scenario in which clean audio and video is available at training time, while at test time only noisy audio is available. We use canonical correlation analysis (CCA) to learn linear projections of the acoustic observations that have maximum correlation with the video frames. We provide an initial demonstration of the approach on a speaker recognition task using data from the VidTIMIT corpus. The projected features, in combination with baseline MFCCs, outperform the baseline recognizer in noisy conditions. The techniques we present are quite general, although here we apply them to the case of a specific speaker recognition task. This is the first work of which we are aware in which multiple views are used to learn an acoustic feature projection at training time, while using only the acoustics at test time.
引用
收藏
页码:82 / +
页数:2
相关论文
共 50 条
  • [21] MULTI-VIEW VISUAL SPEECH RECOGNITION BASED ON MULTI TASK LEARNING
    Han, HouJeung
    Kang, Sunghun
    Yoo, Chang D.
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3983 - 3987
  • [22] AUDIO-VISUAL SPEAKER IDENTIFICATION WITH MULTI-VIEW DISTANCE METRIC LEARNING
    Zheng, Haomian
    Wang, Meng
    Li, Zhu
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 4561 - 4564
  • [23] MULTI-VIEW METRIC LEARNING FOR MULTI-VIEW VIDEO SUMMARIZATION
    Wang, Linbo
    Fang, Xianyong
    Guo, Yanwen
    Fu, Yanwei
    2016 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW), 2016, : 179 - 182
  • [24] Neural representation and learning for multi-view human action recognition
    Iosifidis, Alexandros
    Tefas, Anastasios
    Pitas, Ioannis
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [25] Learning Multi-View Interactional Skeleton Graph for Action Recognition
    Wang, Minsi
    Ni, Bingbing
    Yang, Xiaokang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 6940 - 6954
  • [26] Multi-View Action Recognition by Cross-domain Learning
    Nie, Weizhi
    Liu, Anan
    Yu, Jing
    Su, Yuting
    Chaisorn, Lekha
    Wang, Yongkang
    Kankanhalli, Mohan S.
    2014 IEEE 16TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2014,
  • [27] Multi-view Common Space Learning for Emotion Recognition in the Wild
    Wu, Jianlong
    Lin, Zhouchen
    Zha, Hongbin
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 464 - 471
  • [28] Semi supervised Learning with Constraints for Multi-view Object Recognition
    Melacci, Stefano
    Maggini, Marco
    Gori, Marco
    ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT II, 2009, 5769 : 653 - 662
  • [29] MULTI-VIEW DEEP METRIC LEARNING FOR VOLUMETRIC IMAGE RECOGNITION
    Wang, Xueping
    Liu, Min
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [30] Discriminative Multi-View Subspace Feature Learning for Action Recognition
    Sheng, Biyun
    Li, Jun
    Xiao, Fu
    Li, Qun
    Yang, Wankou
    Han, Junwei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4591 - 4600