Effectiveness of Speech Demodulation-Based Features for Replay Detection

被引：40

作者：

Kamble, Madhu R. ^{[1
]}

Tak, Hemlata ^{[1
]}

Patil, Hemant A. ^{[1
]}

机构：

[1] DA IICT, Speech Res Lab, Gandhinagar, Gujarat, India

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Spoofing; Hilbert transform; Teager energy operator; energy separation algorithm; AUTOMATIC SPEAKER VERIFICATION; ENERGY SEPARATION; COUNTERMEASURES; FREQUENCY;

D O I：

10.21437/Interspeech.2018-1675

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Replay attack presents a great threat to Automatic Speaker Verification (ASV) system. The speech can be modeled as amplitude and frequency modulated (AM-FM) signals. In this paper, we explore speech demodulation-based features using Hilbert transform (HT) and Teager Energy Operator (TEO) for replay detection. In particular, we propose features, namely, FIT-based Instantaneous Amplitude (IA) and Instantaneous Frequency (IF) Cosine Coefficients (i.e., HT-IACC and HT-IFCC) and Energy Separation Algorithm (ESA)-based features (i.e., ESA-IACC and ESA-IFCC). For adapting instantaneous energy w.r.t given sampling frequency, ESA requires 3 samples whereas FIT requires relatively large number of samples and thus, ESA gives high time resolution.The experiments were performed on ASV spoof 2017 Challenge database for replay spoof speech detection (SSD).The experimental results shows that ESA-based features gave lower EER. In addition, linearly spaced Gabor filterbank gave lower EER than Butterworth filterbank. To explore possible complementary information using amplitude and frequency, we have used score-level fusion of IA and IF. With HT-based feature set, the score-level fusion gave EER of 5.24 % (dev) and 10.03 % (eval), whereas ESA-based feature set reduced the EER to 2.01 % (dev) and 9.64 % (eval).

引用

页码：641 / 645

页数：5

共 50 条

[31] A Demodulation-Based Technique for Robust Estimation of Single-Phase Grid Voltage Fundamental Parameters
Reza, Md. Shamim
Agelidis, Vassilios G.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2017, 13 (01) : 166 - 175
[32] Intensity demodulation-based acoustic sensor using dual fiber Bragg gratings and a titanium film
Wang, Shun
Lu, Ping
Zhang, Liang
Liu, Deming
Zhang, Jiangshan
JOURNAL OF MODERN OPTICS, 2014, 61 (12) : 1033 - 1038
[33] Replay attack detection with auditory filter-based relative phase features
Zeyan Oo
Longbiao Wang
Khomdet Phapatanaburi
Meng Liu
Seiichi Nakagawa
Masahiro Iwahashi
Jianwu Dang
EURASIP Journal on Audio, Speech, and Music Processing, 2019
[34] Replay attack detection with auditory filter-based relative phase features
Oo, Zeyan
Wang, Longbiao
Phapatanaburi, Khomdet
Liu, Meng
Nakagawa, Seiichi
Iwahashi, Masahiro
Dang, Jianwu
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (1)
[35] Exploring the Effectiveness of the Phase Features on Double Compressed AMR Speech Detection
Buker, Aykut
Hanilci, Cemal
APPLIED SCIENCES-BASEL, 2024, 14 (11):
[36] Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody
Lazaridis, Alexandros
Cernak, Milos
Garner, Philip N.
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2298 - 2302
[37] Optical synchronous signal demodulation-based quartz-enhanced photoacoustic spectroscopy for remote, multi-point methane detection in complex environments
Sun, Bo
Wei, Tingting
Zhang, Mingjiang
Qiao, Lijun
Ma, Zhe
Sampaolo, Angelo
Patimisco, Pietro
Spagnolo, Vincenzo
Wu, Hongpeng
Dong, Lei
PHOTOACOUSTICS, 2025, 43
[38] Exploration of Compressed ILPR Features for Replay Attack Detection
Jelil, Sarfaraz
Kalita, Sishir
Prasanna, S. R. Mahadeva
Sinha, Rohit
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 631 - 635
[39] Analysis and detection of mimicked speech based on prosodic features
Mary, Leena
Babu, K.
Joseph, Aju
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (03) : 407 - 417
[40] Robust speech detection based on phoneme recognition features
Mihelic, France
Zibert, Janez
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 455 - 462

← 1 2 3 4 5 →