AUTOMATIC SPEAKER VERIFICATION ON COMPRESSED AUDIO

被引:0
|
作者
Sokol, Oleksandra [1 ]
Naumenko, Heorhii [1 ]
Derkach, Viacheslav [1 ]
Kuznetsov, Vasyl [1 ]
Progonov, Dmytro [1 ,2 ]
Husiev, Volodymyr [1 ]
机构
[1] Samsung Elect LLC, Samsung R&D Inst Ukraine, Kiev, Ukraine
[2] Igor Sikorsky Kyiv Polytech Inst, Kiev, Ukraine
关键词
Automatic Speaker Verification; VoIP; Hard Samples Mining;
D O I
10.1109/DESSERT58054.2022.10018734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice-based phishing and wire-fraud attacks have become a topical problem in the recent years due to the emergence of advanced AI-based speech synthesis models. These models can generate realistic speech signal of a known target that is difficult to differentiate from a bonafide voice of real human. This proved to be an issue by the recent security reports related to vishing attacks on bank call centers, fraud or pranking of public figures, and spoofing of voice authentication systems. Current approaches to address voice fraud issue are based on applying an Automatic Speaker Verification (ASV) system. In most cases, these systems are tuned on datasets that consist of wideband quality bonafide and spoofed voice samples. This makes ASV systems vulnerable to speech signal degradation caused by voice encoding in cellular network and Voice over IP (VoIP). However, performance evaluation of ASV ( namely Equal Error Rate (EER) estimation) is almost exclusively available only for the cellular networks. Thus, performance of ASV systems for modern VoIP applications remains unclear. In this paper, we evaluate the modern ASV systems on audio compressed with codecs used in both cellular networks ( AMR and GSM codecs) and VoIP applications (G.711, G.722, AAC, Lyra and Opus codecs). In addition, ASV performance was tested using popular VoIP application (Discord). Obtained results have shown that codec application results in considerable (up to two times) EER increase compared to the baseline results. Moreover, we observed up to three times increase in EER on data transmitted using Discord. We propose to apply hard samples mining to the training process in order to improve the accuracy of ASV systems on compressed voice samples. It allows to reduce EER from 21% down to 16% even for the most distorted samples obtained after aggressive voice compression by GSM codecs. Note, that improvement for real VoIP application is even higher - with EER on Discord data decrease from 35% to 20%.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Ensemble Models for Spoofing Detection in Automatic Speaker Verification
    Chettri, Bhusan
    Stoller, Daniel
    Morfi, Veronica
    Ramirez, Marco A. Martinez
    Benetos, Emmanouil
    Sturm, Bob L.
    INTERSPEECH 2019, 2019, : 1018 - 1022
  • [32] A comparison of three discriminant models for automatic speaker verification
    Slomka, S
    Castellano, P
    Sridharan, S
    ISSPA 96 - FOURTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, 1996, : 325 - 328
  • [33] Introduction to the Issue on Spoofing and Countermeasures for Automatic Speaker Verification
    Yamagishi, Junichi
    Kinnunen, Tomi H.
    Evans, Nicholas
    De Leon, Phillip
    Trancoso, Isabel
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (04) : 585 - 587
  • [34] ADVERSARIAL ATTACKS ON SPOOFING COUNTERMEASURES OF AUTOMATIC SPEAKER VERIFICATION
    Liu, Songxiang
    Wu, Haibin
    Lee, Hung-yi
    Meng, Helen
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 312 - 319
  • [35] ACCESS-CONTROL BY MEANS OF AUTOMATIC SPEAKER VERIFICATION
    KUHN, MH
    JOURNAL OF PHYSICS E-SCIENTIFIC INSTRUMENTS, 1980, 13 (01): : 85 - 86
  • [36] Voice Mimicry Attacks Assisted by Automatic Speaker Verification
    Vestman, Ville
    Kinnunen, Tomi
    Hautamaki, Rosa Gonzalez
    Sahidullah, Md
    COMPUTER SPEECH AND LANGUAGE, 2020, 59 : 36 - 54
  • [37] DESCRIPTION OF A COMPLETELY AUTOMATIC SPEAKER-VERIFICATION SYSTEM
    LUCK, JE
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1969, 46 (1P1): : 90 - &
  • [38] The Attacker's Perspective on Automatic Speaker Verification: An Overview
    Das, Rohan Kumar
    Tian, Xiaohai
    Kinnunen, Tomi
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 4213 - 4217
  • [39] SPECIAL SECTION ON AUTOMATIC SPEAKER RECOGNITION, IDENTIFICATION AND VERIFICATION
    BIMBOT, F
    CHOLLET, G
    PAOLOUI, A
    SPEECH COMMUNICATION, 1995, 17 (1-2) : 77 - 79
  • [40] Random vector quantisation modelling in automatic speaker verification
    Djellali, Hayet
    Laskri, Mohamed Tayeb
    INTERNATIONAL JOURNAL OF BIOMETRICS, 2013, 5 (3-4) : 248 - 265