Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features

被引:0
|
作者
Tong, Chuxuan [1 ]
Natgunanathan, Iynkaran [1 ]
Xiang, Yong [1 ]
Li, Jianhua [1 ]
Zong, Tianrui [1 ,2 ]
Zheng, Xi [3 ]
Gao, Longxiang [4 ,5 ]
机构
[1] Deakin Univ, Sch Informat Technol, Geelong, Vic 3125, Australia
[2] Kexin Technol Ltd, Beijing 100020, Peoples R China
[3] Macquarie Univ, Dept Comp, Sydney, NSW 2109, Australia
[4] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Key Lab Comp Power Network & Informat Secur,Minist, Jinan 250316, Peoples R China
[5] Shandong Fundamental Res Ctr Comp Sci, Shandong Prov Key Lab Comp Power Internet & Serv C, Jinan 250316, Peoples R China
基金
澳大利亚研究理事会;
关键词
Watermarking; Feature extraction; Robustness; Decoding; Training; Acoustics; Transformers; Perturbation methods; Generators; Data mining; Audio watermarking; deep neural networks; watermark attacks; TIME-SCALE MODIFICATION; AUDIO WATERMARKING; ATTACKS; SCHEME; DOMAIN;
D O I
10.1109/TASLP.2024.3486206
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Digital watermarking serves as an effective approach for safeguarding speech signal copyrights, achieved by the incorporation of ownership information into the original signal and its subsequent extraction from the watermarked signal. While traditional watermarking methods can embed and extract watermarks successfully when the watermarked signals are not exposed to severe alterations, these methods cannot withstand attacks such as de-synchronization. In this work, we introduce a novel transformer-based framework designed to enhance the imperceptibility and robustness of speech watermarking. This framework incorporates encoders and decoders built on multi-scale transformer blocks to effectively capture local and long-range features from inputs, such as acoustic features extracted by Short-Time Fourier Transformation (STFT). Further, a deep neural networks (DNNs) based generator, notably the Transformer architecture, is employed to adaptively embed imperceptible watermarks. These perturbations serve as a step for simulating noise, thereby bolstering the watermark robustness during the training phase. Experimental results show the superiority of our proposed framework in terms of watermark imperceptibility and robustness against various watermark attacks. When compared to the currently available related techniques, the framework exhibits an eightfold increase in embedding rate. Further, it also presents superior practicality with scalability and reduced inference time of DNN models.
引用
收藏
页码:4822 / 4837
页数:16
相关论文
共 50 条
  • [31] Fastformer: Transformer-Based Fast Reasoning Framework
    Zhu, Wenjuan
    Guo, Ling
    Zhang, Tianxiang
    Han, Feng
    Wei, Yi
    Gong, Xiaoqing
    Xu, Pengfei
    Guo, Jing
    FOURTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING, ICGIP 2022, 2022, 12705
  • [32] Improving Efficiency and Robustness of Transformer-based Information Retrieval Systems
    Begoli, Edmon
    Srinivasan, Sudarshan
    Mahbub, Maria
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3433 - 3435
  • [33] The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning
    Shen, Ke
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23419 - 23420
  • [34] Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes
    de Oliveira, Danilo
    Peer, Tal
    Gerkmann, Timo
    INTERSPEECH 2022, 2022, : 2948 - 2952
  • [35] Enhancing stock price prediction using GANs and transformer-based attention mechanisms
    Li, Siyi
    Xu, Sijie
    EMPIRICAL ECONOMICS, 2025, 68 (01) : 373 - 403
  • [36] An Exploration of Length Generalization in Transformer-Based Speech Enhancement
    Zhang, Qiquan
    Zhu, Hongxu
    Qian, Xinyuan
    Ambikairajah, Eliathamby
    Li, Haizhou
    INTERSPEECH 2024, 2024, : 1725 - 1729
  • [37] Improving transformer-based acoustic model performance using sequence discriminative training
    Lee, Chae-Won
    Chang, Joon-Hyuk
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2022, 41 (03): : 335 - 341
  • [38] WFormer: A Transformer-Based Soft Fusion Model for Robust Image Watermarking
    Luo, Ting
    Wu, Jun
    He, Zhouyan
    Xu, Haiyong
    Jiang, Gangyi
    Chang, Chin-Chen
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 1 - 18
  • [39] Transcribing paralinguistic acoustic cues to target language text in transformer-based speech-to-text translation
    Tokuyama, Hirotaka
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3976 - 3980
  • [40] Traffic Transformer: Transformer-based framework for temporal traffic accident prediction
    Al-Thani, Mansoor G.
    Sheng, Ziyu
    Cao, Yuting
    Yang, Yin
    AIMS MATHEMATICS, 2024, 9 (05): : 12610 - 12629