Multi-View Learning of Acoustic Features for Speaker Recognition

被引:11
|
作者
Livescu, Karen [1 ]
Stoehr, Mark [2 ]
机构
[1] TTI Chicago, Chicago, IL 60637 USA
[2] Univ Chicago, Chicago, IL 60637 USA
关键词
D O I
10.1109/ASRU.2009.5373462
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider learning acoustic feature transformations using an additional view of the data, in this case video of the speaker's face. Specifically, we consider a scenario in which clean audio and video is available at training time, while at test time only noisy audio is available. We use canonical correlation analysis (CCA) to learn linear projections of the acoustic observations that have maximum correlation with the video frames. We provide an initial demonstration of the approach on a speaker recognition task using data from the VidTIMIT corpus. The projected features, in combination with baseline MFCCs, outperform the baseline recognizer in noisy conditions. The techniques we present are quite general, although here we apply them to the case of a specific speaker recognition task. This is the first work of which we are aware in which multiple views are used to learn an acoustic feature projection at training time, while using only the acoustics at test time.
引用
收藏
页码:82 / +
页数:2
相关论文
共 50 条
  • [41] SURVEY AND EVALUATION OF ACOUSTIC FEATURES FOR SPEAKER RECOGNITION
    Lawson, A.
    Vabishchevich, P.
    Huggins, M.
    Ardis, P.
    Battles, B.
    Stauffer, A.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5444 - 5447
  • [42] Integration of complementary acoustic features for speaker recognition
    Zheng, Nengheng
    Lee, Tan
    Ching, P. C.
    IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (03) : 181 - 184
  • [43] Learning Deep Embedding with Acoustic and Phoneme Features for Speaker Recognition in FM Broadcasting
    Li, Xiao
    Chen, Xiao
    Fu, Rui
    Hu, Xiao
    Chen, Mintong
    Niu, Kun
    IET BIOMETRICS, 2024, 2024 (01)
  • [44] Unsupervised Multi-view Learning
    Huang, Ling
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6442 - 6443
  • [45] A review on multi-view learning
    Yu, Zhiwen
    Dong, Ziyang
    Yu, Chenchen
    Yang, Kaixiang
    Fan, Ziwei
    Chen, C. L. Philip
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (07)
  • [46] Multi-View Reinforcement Learning
    Li, Minne
    Wu, Lisheng
    Ammar, Haitham Bou
    Wang, Jun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [47] Multi-view learning with Universum
    Wang, Zhe
    Zhu, Yujin
    Liu, Wenwen
    Chen, Zhihua
    Gao, Daqi
    KNOWLEDGE-BASED SYSTEMS, 2014, 70 : 376 - 391
  • [48] A Multi-View Face Recognition System
    张永越
    彭振云
    游素亚
    徐光佑
    Journal of Computer Science and Technology, 1997, (05) : 400 - 407
  • [49] Joint Transferable Dictionary Learning and View Adaptation for Multi-view Human Action Recognition
    Sun, Bin
    Kong, Dehui
    Wang, Shaofan
    Wang, Lichun
    Yin, Baocai
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (02)
  • [50] Multi-view Regularized Extreme Learning Machine for Human Action Recognition
    Iosifidis, Alexandros
    Tefas, Anastasios
    Pitas, Ioannis
    ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 : 84 - 94