AUTOMATIC SPEAKER VERIFICATION ON COMPRESSED AUDIO

被引:0
|
作者
Sokol, Oleksandra [1 ]
Naumenko, Heorhii [1 ]
Derkach, Viacheslav [1 ]
Kuznetsov, Vasyl [1 ]
Progonov, Dmytro [1 ,2 ]
Husiev, Volodymyr [1 ]
机构
[1] Samsung Elect LLC, Samsung R&D Inst Ukraine, Kiev, Ukraine
[2] Igor Sikorsky Kyiv Polytech Inst, Kiev, Ukraine
关键词
Automatic Speaker Verification; VoIP; Hard Samples Mining;
D O I
10.1109/DESSERT58054.2022.10018734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice-based phishing and wire-fraud attacks have become a topical problem in the recent years due to the emergence of advanced AI-based speech synthesis models. These models can generate realistic speech signal of a known target that is difficult to differentiate from a bonafide voice of real human. This proved to be an issue by the recent security reports related to vishing attacks on bank call centers, fraud or pranking of public figures, and spoofing of voice authentication systems. Current approaches to address voice fraud issue are based on applying an Automatic Speaker Verification (ASV) system. In most cases, these systems are tuned on datasets that consist of wideband quality bonafide and spoofed voice samples. This makes ASV systems vulnerable to speech signal degradation caused by voice encoding in cellular network and Voice over IP (VoIP). However, performance evaluation of ASV ( namely Equal Error Rate (EER) estimation) is almost exclusively available only for the cellular networks. Thus, performance of ASV systems for modern VoIP applications remains unclear. In this paper, we evaluate the modern ASV systems on audio compressed with codecs used in both cellular networks ( AMR and GSM codecs) and VoIP applications (G.711, G.722, AAC, Lyra and Opus codecs). In addition, ASV performance was tested using popular VoIP application (Discord). Obtained results have shown that codec application results in considerable (up to two times) EER increase compared to the baseline results. Moreover, we observed up to three times increase in EER on data transmitted using Discord. We propose to apply hard samples mining to the training process in order to improve the accuracy of ASV systems on compressed voice samples. It allows to reduce EER from 21% down to 16% even for the most distorted samples obtained after aggressive voice compression by GSM codecs. Note, that improvement for real VoIP application is even higher - with EER on Discord data decrease from 35% to 20%.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge
    Wu, Zhizheng
    Yamagishi, Junichi
    Kinnunen, Tomi
    Hanilci, Cemal
    Sahidullah, Mohammed
    Sizov, Aleksandr
    Evans, Nicholas
    Todisco, Massimiliano
    Delgado, Hector
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (04) : 588 - 604
  • [42] Vulnerability issues in Automatic Speaker Verification (ASV) systems
    Gupta, Priyanka
    Patil, Hemant A.
    Guido, Rodrigo Capobianco
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)
  • [43] Vulnerability issues in Automatic Speaker Verification (ASV) systems
    Priyanka Gupta
    Hemant A. Patil
    Rodrigo Capobianco Guido
    EURASIP Journal on Audio, Speech, and Music Processing, 2024
  • [44] Interchangeability of calibration audio datasets for forensic automatic speaker recognition
    van der Vloed, David
    12TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS, IWBF 2024, 2024,
  • [45] Analysis of Compressed Speech Signals in an Automatic Speaker Recognition System
    Metzger, Richard A.
    Doherty, John F.
    Jenkins, David M.
    2015 49TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2015,
  • [46] Sensitivity of automatic speaker identification to SVD digital audio watermarking
    El-Samie, Fathi
    Shafik, Amira
    El-Sayed, Hala
    Elhalafawy, Said
    Diab, Salaheldin
    Sallam, Bassiouny
    Faragallah, Osama
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (04) : 565 - 581
  • [47] Automatic audio classification and speaker identification for video content analysis
    Liu, Shu-Chang
    Bi, Jing
    Jia, Zhi-Qiang
    Chen, Rui
    Chen, Jie
    Zhou, Min-Min
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 2, PROCEEDINGS, 2007, : 91 - +
  • [48] A MULTI-VIEW APPROACH TO AUDIO-VISUAL SPEAKER VERIFICATION
    Sari, Leda
    Singh, Kritika
    Zhou, Jiatong
    Torresani, Lorenzo
    Singhal, Nayan
    Saraf, Yatharth
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6194 - 6198
  • [49] Speaker verification from partially encrypted compressed speech for forensic investigation
    Khan, L. A.
    Iqbal, Farkhund
    Baig, M. S.
    DIGITAL INVESTIGATION, 2010, 7 (1-2) : 74 - 80
  • [50] Automatic verification of hybrid systems: An audio control protocol
    Abbate, LRS
    SCCC'98 - XVIII INTERNATIONAL CONFERENCE OF THE CHILEAN SOCIETY OF COMPUTER SCIENCE, PROCEEDINGS, 1998, : 184 - 191