Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features

被引：8

作者：

Williams, Jennifer ^{[1
]}

Rownicka, Joanna ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

来源：

INTERSPEECH 2019 | 2019年

基金：

英国工程与自然科学研究理事会;

关键词：

automatic speaker verification; spoofing countermeasures; speech replay detection; NOISE;

D O I：

10.21437/Interspeech.2019-1760

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

We present our system submission to the ASVspoof 2019 Challenge Physical Access (PA) task. The objective for this challenge was to develop a countermeasure that identifies speech audio as either bona fide or intercepted and replayed. The target prediction was a value indicating that a speech segment was bona fide (positive values) or "spoofed" (negative values). Our system used convolutional neural networks (CNNs) and a representation of the speech audio that combined x-vector attack embeddings with signal processing features. The x-vector attack embeddings were created from mel-frequency cepstral coefficients (MFCCs) using a time-delay neural network (TDNN). These embeddings jointly modeled 27 different environments and 9 types of attacks from the labeled data. We also used sub-band spectral centroid magnitude coefficients (SCMCs) as features. We included an additive Gaussian noise layer during training as a way to augment the data to make our system more robust to previously unseen attack examples. We report system performance using the tandem detection cost function (tDCF) and equal error rate (EER). Our approach performed better that both of the challenge baselines. Our technique suggests that our x-vector attack embeddings can help regularize the CNN predictions even when environments or attacks are more challenging.

引用

页码：1053 / 1057

页数：5

共 50 条

[31] REPLAY-ATTACK DETECTION USING FEATURES WITH ADAPTIVE SPECTRO-TEMPORAL RESOLUTION
Liu, Meng
Wang, Longbiao
Lee, Kong Aik
Chen, Xuanda
Dang, Jianwu
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6374 - 6378
[32] Experimental analysis of features for replay attack detection-Results on the ASVspoof 2017 Challenge
Font, Roberto
Espin, Juan M.
Jose Cano, Maria
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 7 - 11
[33] Device Features Based on Linear Transformation With Parallel Training Data for Replay Speech Detection
Xu, Longting
Yang, Jichen
You, Chang Huai
Qian, Xinyuan
Huang, Daiyu
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1574 - 1586
[34] Replay Attack Detection Using Linear Prediction Analysis-Based Relative Phase Features
Phapatanaburi, Khomdet
Wang, Longbiao
Nakagawa, Seiichi
Iwahashi, Masahiro
IEEE ACCESS, 2019, 7 : 183614 - 183625
[35] TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION
Gunendradasan, Tharshini
Irtza, Saad
Ambikairajah, Eliathamby
Epps, Julien
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6136 - 6140
[36] Detection of replay spoof speech using global self-attentive Teager energy features
Chen, Ming
Chen, Xueqin
Shengxue Xuebao/Acta Acustica, 2024, 49 (05): : 1122 - 1130
[37] Replay Attack Detection Using Integrated Glottal Excitation Based Group Delay Function and Cepstral Features
Chaudhari, Amol
Shedge, Dnyandeo
Bairagi, Vinayak
Nanthaamornphong, Aziz
SYMMETRY-BASEL, 2024, 16 (07):
[38] Vector quantization of speech spectral parameters using statistics of static and dynamic features
Koishida, K
Tokuda, K
Masuko, T
Kobayashi, T
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2001, E84D (10) : 1427 - 1434
[39] Speaker Recognition using Multiple X-Vector Speaker Representations with Two-Stage Clustering and Outlier Detection Refinement
Shrestha, Roman
Glackin, Cornelius
Wall, Julie
Cannings, Nigel
Rajwadi, Marvin
Kada, Satya
Laird, James
Laird, Thea
Woodruff, Chris
2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 330 - 335
[40] Synthetic speech detection using fundamental frequency variation and spectral features
Pal, Monisankha
Paul, Dipjyoti
Saha, Goutam
COMPUTER SPEECH AND LANGUAGE, 2018, 48 : 31 - 50

← 1 2 3 4 5 →