DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection

被引:0
|
作者
Rabbia Mahum
Aun Irtaza
Ali Javed
Haitham A. Mahmoud
Haseeb Hassan
机构
[1] UET Taxila,Computer Science Department
[2] UET Taxila,Software Engineering Department
[3] King Saud University,Industrial Engineering Department, College of Engineering
[4] Shenzhen Technology University (SZTU),College of Big Data and Internet
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2024卷
关键词
Deep learning; Spoofing detector; Fake speech detection;
D O I
暂无
中图分类号
学科分类号
摘要
Spoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an automated spoofing detector that can be integrated into automatic speaker verification (ASV) systems. In this study, we recommend a novel and robust model, named DeepDet, based on deep-layered architecture, to categorize speech into two classes: spoofed and bonafide. DeepDet is an improved model based on Yet Another Mobile Network (YAMNet) employing a customized MobileNet combined with a bottleneck attention module (BAM). First, we convert audio into mel-spectrograms that consist of time–frequency representations on mel-scale. Second, we trained our deep layered model using the extracted mel-spectrograms on a Logical Access (LA) set, including synthesized speeches and voice conversions of the ASVspoof-2019 dataset. In the end, we classified the audios, utilizing our trained binary classifier. More precisely, we utilized the power of layered architecture and guided attention that can discern the spoofed speech from bonafide samples. Our proposed improved model employs depth-wise linearly separate convolutions, which makes our model lighter weight than existing techniques. Furthermore, we implemented extensive experiments to assess the performance of the suggested model using the ASVspoof 2019 corpus. We attained an equal error rate (EER) of 0.042% on Logical Access (LA), whereas 0.43% on Physical Access (PA) attacks. Therefore, the performance of the proposed model is significant on the ASVspoof 2019 dataset and indicates the effectiveness of the DeepDet over existing spoofing detectors. Additionally, our proposed model is robust enough that can identify the unseen spoofed audios and classifies the several attacks accurately.
引用
收藏
相关论文
共 50 条
  • [11] Pyramid Attention Upsampling Module for Object Detection
    Park, Hyeokjin
    Paik, Joonki
    IEEE ACCESS, 2022, 10 : 38742 - 38749
  • [12] BAFNet: Bottleneck Attention Based Fusion Network for Sleep Apnea Detection
    Fan, Xiaomao
    Chen, Xianhui
    Ma, Wenjun
    Gao, Weidong
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (05) : 2473 - 2484
  • [13] ZIGZAG ATTENTION: A STRUCTURAL AWARE MODULE FOR LANE DETECTION
    Ling, Jiajun
    Chen, Yifan
    Cheng, Qimin
    Huang, Xiao
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 4175 - 4179
  • [14] Adaptively Fused Attention Module for the Fabric Defect Detection
    Wang, Jin
    Yang, Jingru
    Lu, Guodong
    Zhang, Cheng
    Yu, Zhiyong
    Yang, Ying
    ADVANCED INTELLIGENT SYSTEMS, 2023, 5 (02)
  • [15] Foreground Detection Using an Attention Module and a Video Encoding
    Benavides-Arce, Anthony A.
    Flores-Benites, Victor
    Mora-Colque, Rensso
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT III, 2022, 13233 : 195 - 205
  • [16] DENSE ATTENTION MODULE FOR ACCURATE PULMONARY NODULE DETECTION
    Liu, Jiannan
    Li, Jie
    Xue, Fanyong
    Wu, Chentao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1220 - 1224
  • [17] Selected and refined local attention module for object detection
    Luo, Xiaofan
    Hu, Haifeng
    ELECTRONICS LETTERS, 2020, 56 (14) : 712 - +
  • [18] LABANet: A Lightweight Asymmetrical Bottleneck and Attention-Based Network for Cloud Detection
    Yu, Ximing
    Peng, Yu
    Shao, Wenyi
    Liu, Liansheng
    Sun, Kaipeng
    IEEE SENSORS JOURNAL, 2024, 24 (04) : 4771 - 4785
  • [19] Human Face and Facial Expression Recognition Using Deep Learning and SNet Architecture Integrated with BottleNeck Attention Module
    Sundaram, Sumithra Meenatchi
    Narayanan, Rajkumar
    TRAITEMENT DU SIGNAL, 2023, 40 (02) : 647 - 655
  • [20] SLAM: A Lightweight Spatial Location Attention Module for Object Detection
    Liu, Changda
    Xu, Yunfeng
    Zhong, Jiakui
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 373 - 387