AUTOMATIC SPEAKER VERIFICATION ON COMPRESSED AUDIO

被引:0
|
作者
Sokol, Oleksandra [1 ]
Naumenko, Heorhii [1 ]
Derkach, Viacheslav [1 ]
Kuznetsov, Vasyl [1 ]
Progonov, Dmytro [1 ,2 ]
Husiev, Volodymyr [1 ]
机构
[1] Samsung Elect LLC, Samsung R&D Inst Ukraine, Kiev, Ukraine
[2] Igor Sikorsky Kyiv Polytech Inst, Kiev, Ukraine
关键词
Automatic Speaker Verification; VoIP; Hard Samples Mining;
D O I
10.1109/DESSERT58054.2022.10018734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice-based phishing and wire-fraud attacks have become a topical problem in the recent years due to the emergence of advanced AI-based speech synthesis models. These models can generate realistic speech signal of a known target that is difficult to differentiate from a bonafide voice of real human. This proved to be an issue by the recent security reports related to vishing attacks on bank call centers, fraud or pranking of public figures, and spoofing of voice authentication systems. Current approaches to address voice fraud issue are based on applying an Automatic Speaker Verification (ASV) system. In most cases, these systems are tuned on datasets that consist of wideband quality bonafide and spoofed voice samples. This makes ASV systems vulnerable to speech signal degradation caused by voice encoding in cellular network and Voice over IP (VoIP). However, performance evaluation of ASV ( namely Equal Error Rate (EER) estimation) is almost exclusively available only for the cellular networks. Thus, performance of ASV systems for modern VoIP applications remains unclear. In this paper, we evaluate the modern ASV systems on audio compressed with codecs used in both cellular networks ( AMR and GSM codecs) and VoIP applications (G.711, G.722, AAC, Lyra and Opus codecs). In addition, ASV performance was tested using popular VoIP application (Discord). Obtained results have shown that codec application results in considerable (up to two times) EER increase compared to the baseline results. Moreover, we observed up to three times increase in EER on data transmitted using Discord. We propose to apply hard samples mining to the training process in order to improve the accuracy of ASV systems on compressed voice samples. It allows to reduce EER from 21% down to 16% even for the most distorted samples obtained after aggressive voice compression by GSM codecs. Note, that improvement for real VoIP application is even higher - with EER on Discord data decrease from 35% to 20%.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Optimal Impostor Model in Automatic Speaker Verification
    Djellali, Hayet
    Laskri, Mohamed Tayeb
    PROCEEDINGS OF 2012 INTERNATIONAL CONFERENCE ON COMPLEX SYSTEMS (ICCS12), 2012, : 545 - 550
  • [22] SCHEME FOR SPEECH PROCESSING IN AUTOMATIC SPEAKER VERIFICATION
    DAS, SK
    MOHN, WS
    IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1971, AU19 (01): : 32 - &
  • [23] PERFORMANCE EVALUATION OF AUTOMATIC SPEAKER VERIFICATION SYSTEMS
    SARMA, VVS
    VENUGOPAL, D
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1977, 25 (03): : 264 - 266
  • [24] Automatic Speaker Verification Experiments using HMM
    Munteanu, Doru-Petru
    Toma, Stefan-Adrian
    PROCEEDINGS OF THE 2010 8TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS (COMM), 2010, : 107 - 110
  • [25] Application of ANN and HMM to automatic speaker verification
    Alegre, Federico Leonardo
    IEEE Latin America Transactions, 2007, 5 (05) : 329 - 337
  • [26] Automatic segmentation and clustering for speaker indexing of audio databases
    Chen, YX
    Gao, J
    Wang, Q
    PROCEEDINGS OF THE 11TH JOINT INTERNATIONAL COMPUTER CONFERENCE, 2005, : 399 - 403
  • [27] Dynamic visual features for audio-visual speaker verification
    Dean, David
    Sridharan, Sridha
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02): : 136 - 149
  • [28] Transfer Learning for Speaker Verification with Short-Duration Audio
    Fathima, Noor
    Simha, J. B.
    Abhi, Shinu
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 5, SMARTCOM 2024, 2024, 949 : 195 - 205
  • [29] Extrapolating False Alarm Rates in Automatic Speaker Verification
    Sholokhov, Alexey
    Kinnunen, Tomi
    Vestman, Ville
    Lee, Kong Aik
    INTERSPEECH 2020, 2020, : 4218 - 4222
  • [30] ASSESSMENT OF AUTOMATIC SPEAKER VERIFICATION ON LOSSY TRANSCODED SPEECH
    Polacky, Jozef
    Jarina, Roman
    Chmulik, Michal
    2016 4TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS (IWBF), 2016,