Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features

被引:8
|
作者
Williams, Jennifer [1 ]
Rownicka, Joanna [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
来源
INTERSPEECH 2019 | 2019年
基金
英国工程与自然科学研究理事会;
关键词
automatic speaker verification; spoofing countermeasures; speech replay detection; NOISE;
D O I
10.21437/Interspeech.2019-1760
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We present our system submission to the ASVspoof 2019 Challenge Physical Access (PA) task. The objective for this challenge was to develop a countermeasure that identifies speech audio as either bona fide or intercepted and replayed. The target prediction was a value indicating that a speech segment was bona fide (positive values) or "spoofed" (negative values). Our system used convolutional neural networks (CNNs) and a representation of the speech audio that combined x-vector attack embeddings with signal processing features. The x-vector attack embeddings were created from mel-frequency cepstral coefficients (MFCCs) using a time-delay neural network (TDNN). These embeddings jointly modeled 27 different environments and 9 types of attacks from the labeled data. We also used sub-band spectral centroid magnitude coefficients (SCMCs) as features. We included an additive Gaussian noise layer during training as a way to augment the data to make our system more robust to previously unseen attack examples. We report system performance using the tandem detection cost function (tDCF) and equal error rate (EER). Our approach performed better that both of the challenge baselines. Our technique suggests that our x-vector attack embeddings can help regularize the CNN predictions even when environments or attacks are more challenging.
引用
收藏
页码:1053 / 1057
页数:5
相关论文
共 50 条
  • [31] REPLAY-ATTACK DETECTION USING FEATURES WITH ADAPTIVE SPECTRO-TEMPORAL RESOLUTION
    Liu, Meng
    Wang, Longbiao
    Lee, Kong Aik
    Chen, Xuanda
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6374 - 6378
  • [32] Experimental analysis of features for replay attack detection-Results on the ASVspoof 2017 Challenge
    Font, Roberto
    Espin, Juan M.
    Jose Cano, Maria
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 7 - 11
  • [33] Device Features Based on Linear Transformation With Parallel Training Data for Replay Speech Detection
    Xu, Longting
    Yang, Jichen
    You, Chang Huai
    Qian, Xinyuan
    Huang, Daiyu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1574 - 1586
  • [34] Replay Attack Detection Using Linear Prediction Analysis-Based Relative Phase Features
    Phapatanaburi, Khomdet
    Wang, Longbiao
    Nakagawa, Seiichi
    Iwahashi, Masahiro
    IEEE ACCESS, 2019, 7 : 183614 - 183625
  • [35] TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION
    Gunendradasan, Tharshini
    Irtza, Saad
    Ambikairajah, Eliathamby
    Epps, Julien
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6136 - 6140
  • [36] Detection of replay spoof speech using global self-attentive Teager energy features
    Chen, Ming
    Chen, Xueqin
    Shengxue Xuebao/Acta Acustica, 2024, 49 (05): : 1122 - 1130
  • [37] Replay Attack Detection Using Integrated Glottal Excitation Based Group Delay Function and Cepstral Features
    Chaudhari, Amol
    Shedge, Dnyandeo
    Bairagi, Vinayak
    Nanthaamornphong, Aziz
    SYMMETRY-BASEL, 2024, 16 (07):
  • [38] Vector quantization of speech spectral parameters using statistics of static and dynamic features
    Koishida, K
    Tokuda, K
    Masuko, T
    Kobayashi, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2001, E84D (10) : 1427 - 1434
  • [39] Speaker Recognition using Multiple X-Vector Speaker Representations with Two-Stage Clustering and Outlier Detection Refinement
    Shrestha, Roman
    Glackin, Cornelius
    Wall, Julie
    Cannings, Nigel
    Rajwadi, Marvin
    Kada, Satya
    Laird, James
    Laird, Thea
    Woodruff, Chris
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 330 - 335
  • [40] Synthetic speech detection using fundamental frequency variation and spectral features
    Pal, Monisankha
    Paul, Dipjyoti
    Saha, Goutam
    COMPUTER SPEECH AND LANGUAGE, 2018, 48 : 31 - 50