STFF-SM: Steganalysis Model Based on Spatial and Temporal Feature Fusion for Speech Streams

被引:4
|
作者
Tian, Hui [1 ,2 ]
Qiu, Yiqin [1 ,2 ]
Mazurczyk, Wojciech [3 ]
Li, Haizhou [4 ,5 ]
Qian, Zhenxing [6 ]
机构
[1] Natl Huaqiao Univ, Coll Comp Sci & Technol, Xiamen 361021, Peoples R China
[2] Xiamen Key Lab Data Secur & Blockchain Technol, Xiamen 361021, Peoples R China
[3] Warsaw Univ Technol, Fac Elect & Informat Technol, Inst Comp Sci, PL-00665 Warsaw, Poland
[4] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Peoples R China
[5] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[6] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Delays; Feature extraction; Steganography; Quantization (signal); Distortion; Speech coding; Resistance; Steganalysis; steganography; voice over Internet protocol; speech streams; deep neural networks; pitch delays; STEGANOGRAPHY; SCHEME; VOICE;
D O I
10.1109/TASLP.2022.3224295
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The real-time detection of speech steganography in Voice-over-Internet-Protocol (VoIP) scenarios remains an open problem, as it requires steganalysis methods to perform for low-intensity embeddings and short-sample inputs, as well as provide rapid detection results. To address these challenges, this paper presents a novel steganalysis model based on spatial and temporal feature fusion (STFF-SM). Differing from the existing methods, we take both the integer and fractional pitch delays as input, and design subframe-stitch module to organically integrate subframe-wise integer delays and frame-wise fractional pitch delays. Further, we design a spatial fusion module based on pre-activation residual convolution to extract the pitch spatial features and gradually increase their dimensions to discover finer steganographic distortions to enhance the detection effect, where a Group-Squeeze-Weighting block is introduced to alleviate the information loss in the process of increasing the feature dimension. In addition, we design a temporal fusion module to extract pitch temporal features using the stacked LSTM, where a Gated Feed-Forward Network is introduced to learn the interaction between different feature maps while suppressing the features that are not useful for detection. We evaluated the performance of STFF-SM through comprehensive experiments and comparisons with the state-of-the-art solutions. The experimental results demonstrate that STFF-SM can well meet the needs of real-time detection of speech steganography in VoIP streams, and outperforms the existing methods in detection performance, especially with low embedding strengths and short window sizes.
引用
收藏
页码:277 / 289
页数:13
相关论文
共 50 条
  • [1] Spatial-frequency feature vector fusion based steganalysis
    Hong Cai
    Agaian, Sos S.
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 1866 - +
  • [2] A Speech Steganalysis Algorithm Based on Multi-Feature Fusion and BiLSTM
    Su Z.-P.
    Zhang L.
    Zhang G.-F.
    Yue F.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (05): : 1300 - 1309
  • [3] Image Block Regression Based on Feature Fusion for CNN-Based Spatial Steganalysis
    Chen, Ziqing
    Yu, Xiangyu
    Chen, Runze
    DIGITAL FORENSICS AND WATERMARKING, IWDW 2021, 2022, 13180 : 258 - 272
  • [4] A BLIND AUDIO STEGANALYSIS BASED ON FEATURE FUSION
    Wei Yifang Guo Li Wang Yujie Wang Cuiping (Department of Electronic Science and Technology
    Journal of Electronics(China), 2011, 28 (03) : 265 - 276
  • [5] Steganalysis of spatial image combining fusion features and feature mapping
    Luo W.-W.
    Liu S.-W.
    Zhang B.-T.
    Li M.
    Liu H.-L.
    Fan L.-Y.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2023, 53 (11): : 3260 - 3267
  • [6] Overlapped Speech Detection based on Spectral and Spatial Feature Fusion
    Chen, Weiguang
    Van Tung Pham
    Chng, Eng Siong
    Zhong, Xionghu
    INTERSPEECH 2021, 2021, : 4189 - 4193
  • [7] InSeC: Steganalysis Model Based on Inter-Codeword Sensitivity Caption for Compressed Speech Streams
    Zhang, Hao
    Yang, Jie
    Gao, Feipeng
    Yuan, Jiacheng
    IEEE ACCESS, 2024, 12 : 192251 - 192263
  • [8] FPFnet: Image steganalysis model based on adaptive residual extraction and feature pyramid fusion
    Jingtai Li
    Xiaodan Wang
    Yafei Song
    Peng Wang
    Multimedia Tools and Applications, 2024, 83 : 48539 - 48561
  • [9] FPFnet: Image steganalysis model based on adaptive residual extraction and feature pyramid fusion
    Li, Jingtai
    Wang, Xiaodan
    Song, Yafei
    Wang, Peng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 48539 - 48561
  • [10] Spatial rich model steganalysis feature normalization on random feature-subsets
    Pengfei Wang
    Zhihui Wei
    Liang Xiao
    Soft Computing, 2018, 22 : 1981 - 1992