Multi-View Learning of Acoustic Features for Speaker Recognition

被引:11
|
作者
Livescu, Karen [1 ]
Stoehr, Mark [2 ]
机构
[1] TTI Chicago, Chicago, IL 60637 USA
[2] Univ Chicago, Chicago, IL 60637 USA
关键词
D O I
10.1109/ASRU.2009.5373462
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider learning acoustic feature transformations using an additional view of the data, in this case video of the speaker's face. Specifically, we consider a scenario in which clean audio and video is available at training time, while at test time only noisy audio is available. We use canonical correlation analysis (CCA) to learn linear projections of the acoustic observations that have maximum correlation with the video frames. We provide an initial demonstration of the approach on a speaker recognition task using data from the VidTIMIT corpus. The projected features, in combination with baseline MFCCs, outperform the baseline recognizer in noisy conditions. The techniques we present are quite general, although here we apply them to the case of a specific speaker recognition task. This is the first work of which we are aware in which multiple views are used to learn an acoustic feature projection at training time, while using only the acoustics at test time.
引用
收藏
页码:82 / +
页数:2
相关论文
共 50 条
  • [1] Multi-view representation learning for multi-view action recognition
    Hao, Tong
    Wu, Dan
    Wang, Qian
    Sun, Jin-Sheng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 48 : 453 - 460
  • [2] Jointly Learning Multi-view Features for Human Action Recognition
    Wang, Ruoshi
    Liu, Zhigang
    Yin, Ziyang
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 4858 - 4861
  • [3] MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION
    Wang, Rui
    Ao, Junyi
    Zhou, Long
    Liu, Shujie
    Wei, Zhihua
    Ko, Tom
    Li, Qing
    Zhang, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6732 - 6736
  • [4] Automatic Multi-view Action Recognition with Robust Features
    Chou, Kuang-Pen
    Prasad, Mukesh
    Li, Dong-Lin
    Bharill, Neha
    Lin, Yu-Feng
    Hussain, Farookh
    Lin, Chin-Teng
    Lin, Wen-Chieh
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 554 - 563
  • [5] MULTI-VIEW CCA-BASED ACOUSTIC FEATURES FOR PHONETIC RECOGNITION ACROSS SPEAKERS AND DOMAINS
    Arora, Raman
    Livescu, Karen
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7135 - 7139
  • [6] DVANet: Disentangling View and Action Features for Multi-View Action Recognition
    Siddiqui, Nyle
    Tirupattur, Praveen
    Shah, Mubarak
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4873 - 4881
  • [7] Multi-view learning for visual violence recognition with maximum entropy discrimination and deep features
    Sun, Shiliang
    Liu, Yuhan
    Mao, Liang
    INFORMATION FUSION, 2019, 50 : 43 - 53
  • [8] Face Recognition Based on Multi-view Ensemble Learning
    Shi, Wenhui
    Jiang, Mingyan
    PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 127 - 136
  • [9] Multi-View Action Recognition using Contrastive Learning
    Shah, Ketul
    Shah, Anshul
    Lau, Chun Pong
    de Melo, Celso M.
    Chellappa, Rama
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3370 - 3380
  • [10] Uncorrelated Multi-View Discrimination Dictionary Learning for Recognition
    Jing, Xiao-Yuan
    Hu, Rui-Min
    Wu, Fei
    Chen, Xi-Lin
    Liu, Qian
    Yao, Yong-Fang
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 2787 - 2795