AUTOMATIC SPEAKER VERIFICATION ON COMPRESSED AUDIO

被引:0
|
作者
Sokol, Oleksandra [1 ]
Naumenko, Heorhii [1 ]
Derkach, Viacheslav [1 ]
Kuznetsov, Vasyl [1 ]
Progonov, Dmytro [1 ,2 ]
Husiev, Volodymyr [1 ]
机构
[1] Samsung Elect LLC, Samsung R&D Inst Ukraine, Kiev, Ukraine
[2] Igor Sikorsky Kyiv Polytech Inst, Kiev, Ukraine
关键词
Automatic Speaker Verification; VoIP; Hard Samples Mining;
D O I
10.1109/DESSERT58054.2022.10018734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice-based phishing and wire-fraud attacks have become a topical problem in the recent years due to the emergence of advanced AI-based speech synthesis models. These models can generate realistic speech signal of a known target that is difficult to differentiate from a bonafide voice of real human. This proved to be an issue by the recent security reports related to vishing attacks on bank call centers, fraud or pranking of public figures, and spoofing of voice authentication systems. Current approaches to address voice fraud issue are based on applying an Automatic Speaker Verification (ASV) system. In most cases, these systems are tuned on datasets that consist of wideband quality bonafide and spoofed voice samples. This makes ASV systems vulnerable to speech signal degradation caused by voice encoding in cellular network and Voice over IP (VoIP). However, performance evaluation of ASV ( namely Equal Error Rate (EER) estimation) is almost exclusively available only for the cellular networks. Thus, performance of ASV systems for modern VoIP applications remains unclear. In this paper, we evaluate the modern ASV systems on audio compressed with codecs used in both cellular networks ( AMR and GSM codecs) and VoIP applications (G.711, G.722, AAC, Lyra and Opus codecs). In addition, ASV performance was tested using popular VoIP application (Discord). Obtained results have shown that codec application results in considerable (up to two times) EER increase compared to the baseline results. Moreover, we observed up to three times increase in EER on data transmitted using Discord. We propose to apply hard samples mining to the training process in order to improve the accuracy of ASV systems on compressed voice samples. It allows to reduce EER from 21% down to 16% even for the most distorted samples obtained after aggressive voice compression by GSM codecs. Note, that improvement for real VoIP application is even higher - with EER on Discord data decrease from 35% to 20%.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] LOGICAL ACCESS ATTACKS DETECTION THROUGH AUDIO FINGERPRINTING IN AUTOMATIC SPEAKER VERIFICATION
    Espin, Juan M.
    Font, R.
    Marin-Blazquez, Javier G.
    Esquembre, F.
    2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
  • [2] Deepfake audio detection by speaker verification
    Pianese, Alessandro
    Cozzolino, Davide
    Poggi, Giovanni
    Verdoliva, Luisa
    2022 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2022,
  • [3] AUTOMATIC SPEAKER VERIFICATION - REVIEW
    ROSENBERG, AE
    PROCEEDINGS OF THE IEEE, 1976, 64 (04) : 475 - 487
  • [4] Automatic transcription of compressed broadcast audio
    Barras, C
    Lamel, L
    Gauvain, JL
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 265 - 268
  • [5] NEW TECHNIQUES FOR AUTOMATIC SPEAKER VERIFICATION
    ROSENBERG, AE
    SAMBUR, MR
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (02): : 169 - 176
  • [6] EVASION AND OBFUSCATION IN AUTOMATIC SPEAKER VERIFICATION
    Alegre, Federico
    Soldi, Giovanni
    Evans, Nicholas
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] Automatic speechreading with application to speaker verification
    Broun, CC
    Zhang, X
    Mersereau, RM
    Clements, M
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 685 - 688
  • [8] Spoofing and countermeasures for automatic speaker verification
    Evans, Nicholas
    Kinnunen, Tomi
    Yamagishi, Junichi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 925 - 929
  • [9] Segmental approaches for automatic speaker verification
    Petrovska-Delacrétaz, D
    Cernocky, J
    Hennebert, J
    Chollet, G
    DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) : 198 - 212
  • [10] Review of Methods for Automatic Speaker Verification
    O'Shaughnessy, Douglas
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1776 - 1789