Multi-View Learning of Acoustic Features for Speaker Recognition

被引：11

作者：

Livescu, Karen ^{[1
]}

Stoehr, Mark ^{[2
]}

机构：

[1] TTI Chicago, Chicago, IL 60637 USA

[2] Univ Chicago, Chicago, IL 60637 USA

来源：

2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009) | 2009年

关键词：

D O I：

10.1109/ASRU.2009.5373462

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider learning acoustic feature transformations using an additional view of the data, in this case video of the speaker's face. Specifically, we consider a scenario in which clean audio and video is available at training time, while at test time only noisy audio is available. We use canonical correlation analysis (CCA) to learn linear projections of the acoustic observations that have maximum correlation with the video frames. We provide an initial demonstration of the approach on a speaker recognition task using data from the VidTIMIT corpus. The projected features, in combination with baseline MFCCs, outperform the baseline recognizer in noisy conditions. The techniques we present are quite general, although here we apply them to the case of a specific speaker recognition task. This is the first work of which we are aware in which multiple views are used to learn an acoustic feature projection at training time, while using only the acoustics at test time.

引用

页码：82 / +

页数：2

共 50 条

[21] MULTI-VIEW VISUAL SPEECH RECOGNITION BASED ON MULTI TASK LEARNING
Han, HouJeung
Kang, Sunghun
Yoo, Chang D.
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3983 - 3987
[22] AUDIO-VISUAL SPEAKER IDENTIFICATION WITH MULTI-VIEW DISTANCE METRIC LEARNING
Zheng, Haomian
Wang, Meng
Li, Zhu
2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 4561 - 4564
[23] MULTI-VIEW METRIC LEARNING FOR MULTI-VIEW VIDEO SUMMARIZATION
Wang, Linbo
Fang, Xianyong
Guo, Yanwen
Fu, Yanwei
2016 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW), 2016, : 179 - 182
[24] Neural representation and learning for multi-view human action recognition
Iosifidis, Alexandros
Tefas, Anastasios
Pitas, Ioannis
2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
[25] Learning Multi-View Interactional Skeleton Graph for Action Recognition
Wang, Minsi
Ni, Bingbing
Yang, Xiaokang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 6940 - 6954
[26] Multi-View Action Recognition by Cross-domain Learning
Nie, Weizhi
Liu, Anan
Yu, Jing
Su, Yuting
Chaisorn, Lekha
Wang, Yongkang
Kankanhalli, Mohan S.
2014 IEEE 16TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2014,
[27] Multi-view Common Space Learning for Emotion Recognition in the Wild
Wu, Jianlong
Lin, Zhouchen
Zha, Hongbin
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 464 - 471
[28] Semi supervised Learning with Constraints for Multi-view Object Recognition
Melacci, Stefano
Maggini, Marco
Gori, Marco
ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT II, 2009, 5769 : 653 - 662
[29] MULTI-VIEW DEEP METRIC LEARNING FOR VOLUMETRIC IMAGE RECOGNITION
Wang, Xueping
Liu, Min
2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
[30] Discriminative Multi-View Subspace Feature Learning for Action Recognition
Sheng, Biyun
Li, Jun
Xiao, Fu
Li, Qun
Yang, Wankou
Han, Junwei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4591 - 4600

← 1 2 3 4 5 →