Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features

被引:0
|
作者
Tong, Chuxuan [1 ]
Natgunanathan, Iynkaran [1 ]
Xiang, Yong [1 ]
Li, Jianhua [1 ]
Zong, Tianrui [1 ,2 ]
Zheng, Xi [3 ]
Gao, Longxiang [4 ,5 ]
机构
[1] Deakin Univ, Sch Informat Technol, Geelong, Vic 3125, Australia
[2] Kexin Technol Ltd, Beijing 100020, Peoples R China
[3] Macquarie Univ, Dept Comp, Sydney, NSW 2109, Australia
[4] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Key Lab Comp Power Network & Informat Secur,Minist, Jinan 250316, Peoples R China
[5] Shandong Fundamental Res Ctr Comp Sci, Shandong Prov Key Lab Comp Power Internet & Serv C, Jinan 250316, Peoples R China
基金
澳大利亚研究理事会;
关键词
Watermarking; Feature extraction; Robustness; Decoding; Training; Acoustics; Transformers; Perturbation methods; Generators; Data mining; Audio watermarking; deep neural networks; watermark attacks; TIME-SCALE MODIFICATION; AUDIO WATERMARKING; ATTACKS; SCHEME; DOMAIN;
D O I
10.1109/TASLP.2024.3486206
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Digital watermarking serves as an effective approach for safeguarding speech signal copyrights, achieved by the incorporation of ownership information into the original signal and its subsequent extraction from the watermarked signal. While traditional watermarking methods can embed and extract watermarks successfully when the watermarked signals are not exposed to severe alterations, these methods cannot withstand attacks such as de-synchronization. In this work, we introduce a novel transformer-based framework designed to enhance the imperceptibility and robustness of speech watermarking. This framework incorporates encoders and decoders built on multi-scale transformer blocks to effectively capture local and long-range features from inputs, such as acoustic features extracted by Short-Time Fourier Transformation (STFT). Further, a deep neural networks (DNNs) based generator, notably the Transformer architecture, is employed to adaptively embed imperceptible watermarks. These perturbations serve as a step for simulating noise, thereby bolstering the watermark robustness during the training phase. Experimental results show the superiority of our proposed framework in terms of watermark imperceptibility and robustness against various watermark attacks. When compared to the currently available related techniques, the framework exhibits an eightfold increase in embedding rate. Further, it also presents superior practicality with scalability and reduced inference time of DNN models.
引用
收藏
页码:4822 / 4837
页数:16
相关论文
共 50 条
  • [1] TRANSFORMER-BASED ACOUSTIC MODELING FOR HYBRID SPEECH RECOGNITION
    Wang, Yongqiang
    Mohamed, Abdelrahman
    Le, Duc
    Liu, Chunxi
    Xiao, Alex
    Mahadeokar, Jay
    Huang, Hongzhao
    Tjandra, Andros
    Zhang, Xiaohui
    Zhang, Frank
    Fuegen, Christian
    Zweig, Geoffrey
    Seltzer, Michael L.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6874 - 6878
  • [2] Transformer-based Acoustic Modeling for Streaming Speech Synthesis
    Wu, Chunyang
    Xiu, Zhiping
    Shi, Yangyang
    Kalinli, Ozlem
    Fuegen, Christian
    Koehler, Thilo
    He, Qing
    INTERSPEECH 2021, 2021, : 146 - 150
  • [3] Enhancing tourism demand forecasting with a transformer-based framework
    Li, Xin
    Xu, Yechi
    Law, Rob
    Wang, Shouyang
    ANNALS OF TOURISM RESEARCH, 2024, 107
  • [4] InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance
    Wang, Xingqiao
    Xu, Xiaowei
    Tong, Weida
    Roberts, Ruth
    Liu, Zhichao
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [5] Transformer-based Summarization by Exploiting Social Information
    Minh-Tien Nguyen
    Van-Chien Nguyen
    Huy-The Vu
    Van-Hau Nguyen
    2020 12TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (IEEE KSE 2020), 2020, : 25 - 30
  • [6] On Robustness of Finetuned Transformer-based NLP Models
    Neerudu, Pavan Kalyan Reddy
    Oota, Subba Reddy
    Marreddy, Mounika
    Kagita, Venkateswara Rao
    Gupta, Manish
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7180 - 7195
  • [7] A transformer-based network for speech recognition
    Tang L.
    International Journal of Speech Technology, 2023, 26 (02) : 531 - 539
  • [8] TRANSFORMER IN ACTION: A COMPARATIVE STUDY OF TRANSFORMER-BASED ACOUSTIC MODELS FOR LARGE SCALE SPEECH RECOGNITION APPLICATIONS
    Wang, Yongqiang
    Shi, Yangyang
    Zhang, Frank
    Wu, Chunyang
    Chan, Julian
    Yeh, Ching-Feng
    Xiao, Alex
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6778 - 6782
  • [9] Enhancing Image Captioning with Transformer-Based Two-Pass Decoding Framework
    Su, Jindian
    Mou, Yueqi
    Xie, Yunhao
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024, 2024, 14875 : 171 - 183
  • [10] Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition
    Lee, Mun-Hak
    Lee, Sang-Eon
    Seong, Ju-Seok
    Chang, Joon-Hyuk
    Kwon, Haeyoung
    Park, Chanhee
    INTERSPEECH 2022, 2022, : 56 - 60