Effectiveness of Speech Demodulation-Based Features for Replay Detection

被引：40

作者：

Kamble, Madhu R. ^{[1
]}

Tak, Hemlata ^{[1
]}

Patil, Hemant A. ^{[1
]}

机构：

[1] DA IICT, Speech Res Lab, Gandhinagar, Gujarat, India

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Spoofing; Hilbert transform; Teager energy operator; energy separation algorithm; AUTOMATIC SPEAKER VERIFICATION; ENERGY SEPARATION; COUNTERMEASURES; FREQUENCY;

D O I：

10.21437/Interspeech.2018-1675

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Replay attack presents a great threat to Automatic Speaker Verification (ASV) system. The speech can be modeled as amplitude and frequency modulated (AM-FM) signals. In this paper, we explore speech demodulation-based features using Hilbert transform (HT) and Teager Energy Operator (TEO) for replay detection. In particular, we propose features, namely, FIT-based Instantaneous Amplitude (IA) and Instantaneous Frequency (IF) Cosine Coefficients (i.e., HT-IACC and HT-IFCC) and Energy Separation Algorithm (ESA)-based features (i.e., ESA-IACC and ESA-IFCC). For adapting instantaneous energy w.r.t given sampling frequency, ESA requires 3 samples whereas FIT requires relatively large number of samples and thus, ESA gives high time resolution.The experiments were performed on ASV spoof 2017 Challenge database for replay spoof speech detection (SSD).The experimental results shows that ESA-based features gave lower EER. In addition, linearly spaced Gabor filterbank gave lower EER than Butterworth filterbank. To explore possible complementary information using amplitude and frequency, we have used score-level fusion of IA and IF. With HT-based feature set, the score-level fusion gave EER of 5.24 % (dev) and 10.03 % (eval), whereas ESA-based feature set reduced the EER to 2.01 % (dev) and 9.64 % (eval).

引用

页码：641 / 645

页数：5

共 50 条

[41] Emotion Detection using Perceptual based Speech Features
Lalitha, S.
Tripathi, Shikha
2016 IEEE ANNUAL INDIA CONFERENCE (INDICON), 2016,
[42] Teager Energy Operator Based Features with x-vector for Replay Attack Detection
Zhang, Zhenchuan
Zhou, Liming
Yang, Yingchun
Wu, Zhaohui
BIOMETRIC RECOGNITION (CCBR 2019), 2019, 11818 : 466 - 473
[43] Evolutionary fusion of classifiers trained on linear prediction based features for replay attack detection
Nasersharif, Babak
Yazdani, Morteza
EXPERT SYSTEMS, 2021, 38 (03)
[44] A Parallel-Phase Demodulation-Based Distance-Measurement Method Using Dual-Frequency Modulation
Jang, In-Gyu
Lee, Sung-Hyun
Park, Yong-Hwa
APPLIED SCIENCES-BASEL, 2020, 10 (01):
[45] SVM-based speech endpoint detection using contextual speech features
Ramirez, J.
Yelamos, R.
Gorriz, J. M.
Segura, J. C.
ELECTRONICS LETTERS, 2006, 42 (07) : 426 - 428
[46] Speech Based Features Applied to the Detection of Non-speech Audio Events
Vozarikova, Eva
Cizmar, Anton
12TH INTERNATIONAL CONFERENCE ON RESEARCH IN TELECOMMUNICATION TECHNOLOGIES (RTT 2010), 2010, : 125 - 128
[47] Speech/Non-Speech Segments Detection Based On Chaotic and Prosodic Features
Shafiee, Soheil
Almasganj, Farshad
Jafari, Ayyoob
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 111 - 114
[48] Discrimination Effectiveness of Speech Cepstral Features
Malegaonkar, A.
Ariyaeeinia, A.
Sivakumaran, P.
Pillay, S.
BIOMETRICS AND IDENTITY MANAGEMENT, 2008, 5372 : 91 - 99
[49] Multidimensional Dynamic Displacement and Strain Measurement Using an Intensity Demodulation-Based Fiber Bragg Grating Sensing System
Chuang, Kuo-Chih
Ma, Chien-Ching
JOURNAL OF LIGHTWAVE TECHNOLOGY, 2010, 28 (13) : 1897 - 1905
[50] Scaling demodulation-based mode decomposition for analyzing nonstationary signal with close-spaced and intersecting frequency trajectories
Zhao, Dezun
Cui, Lingli
Chu, Fulei
MEASUREMENT, 2022, 203

← 1 2 3 4 5 →