Multi-View Learning of Acoustic Features for Speaker Recognition

被引:11
|
作者
Livescu, Karen [1 ]
Stoehr, Mark [2 ]
机构
[1] TTI Chicago, Chicago, IL 60637 USA
[2] Univ Chicago, Chicago, IL 60637 USA
关键词
D O I
10.1109/ASRU.2009.5373462
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider learning acoustic feature transformations using an additional view of the data, in this case video of the speaker's face. Specifically, we consider a scenario in which clean audio and video is available at training time, while at test time only noisy audio is available. We use canonical correlation analysis (CCA) to learn linear projections of the acoustic observations that have maximum correlation with the video frames. We provide an initial demonstration of the approach on a speaker recognition task using data from the VidTIMIT corpus. The projected features, in combination with baseline MFCCs, outperform the baseline recognizer in noisy conditions. The techniques we present are quite general, although here we apply them to the case of a specific speaker recognition task. This is the first work of which we are aware in which multiple views are used to learn an acoustic feature projection at training time, while using only the acoustics at test time.
引用
收藏
页码:82 / +
页数:2
相关论文
共 50 条
  • [31] Common and Unique Features Learning in Multi-view Network Embedding
    Shang, Yifan
    Ye, Xiucai
    Sakurai, Tetsuya
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [32] DeepInteract: Multi-view features interactive learning for sequential recommendation
    Gan, Mingxin
    Ma, Yingxue
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 204
  • [33] Learning Latent Features for Multi-view Clustering Based on NMF
    He, Mengjiao
    Yang, Yan
    Wang, Hongjun
    ROUGH SETS, (IJCRS 2016), 2016, 9920 : 459 - 469
  • [34] Learning discriminant features for multi-view face and eye detection
    Wang, P
    Ji, Q
    2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 373 - 379
  • [35] Multi-view Facial Expression Recognition using Local Appearance Features
    Hesse, Nikolas
    Gehrig, Tobias
    Gao, Hua
    Ekenel, Hazim Kemal
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 3533 - 3536
  • [36] Human action recognition using multi-view image sequences features
    Ahmad, Mohiuddin
    Lee, Seong-Whan
    PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION - PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE, 2006, : 523 - +
  • [37] Learning multi-kernel multi-view canonical correlations for image recognition
    Yun-Hao Yuan
    Yun Li
    Jianjun Liu
    Chao-Feng Li
    Xiao-Bo Shen
    Guoqing Zhang
    Quan-Sen Sun
    ComputationalVisualMedia, 2016, 2 (02) : 153 - 162
  • [38] Multi-view dreaming: multi-view world model with contrastive learning
    Kinose A.
    Okumura R.
    Okada M.
    Taniguchi T.
    Advanced Robotics, 2023, 37 (19) : 1212 - 1220
  • [39] Learning multi-kernel multi-view canonical correlations for image recognition
    Yuan Y.-H.
    Li Y.
    Liu J.
    Li C.-F.
    Shen X.-B.
    Zhang G.
    Sun Q.-S.
    Computational Visual Media, 2016, 2 (2) : 153 - 162
  • [40] Fusion of acoustic and tokenization features for speaker recognition
    Tong, Rong
    Ma, Bin
    Lee, Kong-Aik
    You, Changhuai
    Zhu, Donglai
    Kinnunen, Tomi
    Sun, Hanwu
    Dong, Minghui
    Chng, Eng-Siong
    Li, Haizhou
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 566 - +