STFF-SM: Steganalysis Model Based on Spatial and Temporal Feature Fusion for Speech Streams

被引:4
|
作者
Tian, Hui [1 ,2 ]
Qiu, Yiqin [1 ,2 ]
Mazurczyk, Wojciech [3 ]
Li, Haizhou [4 ,5 ]
Qian, Zhenxing [6 ]
机构
[1] Natl Huaqiao Univ, Coll Comp Sci & Technol, Xiamen 361021, Peoples R China
[2] Xiamen Key Lab Data Secur & Blockchain Technol, Xiamen 361021, Peoples R China
[3] Warsaw Univ Technol, Fac Elect & Informat Technol, Inst Comp Sci, PL-00665 Warsaw, Poland
[4] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Peoples R China
[5] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[6] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Delays; Feature extraction; Steganography; Quantization (signal); Distortion; Speech coding; Resistance; Steganalysis; steganography; voice over Internet protocol; speech streams; deep neural networks; pitch delays; STEGANOGRAPHY; SCHEME; VOICE;
D O I
10.1109/TASLP.2022.3224295
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The real-time detection of speech steganography in Voice-over-Internet-Protocol (VoIP) scenarios remains an open problem, as it requires steganalysis methods to perform for low-intensity embeddings and short-sample inputs, as well as provide rapid detection results. To address these challenges, this paper presents a novel steganalysis model based on spatial and temporal feature fusion (STFF-SM). Differing from the existing methods, we take both the integer and fractional pitch delays as input, and design subframe-stitch module to organically integrate subframe-wise integer delays and frame-wise fractional pitch delays. Further, we design a spatial fusion module based on pre-activation residual convolution to extract the pitch spatial features and gradually increase their dimensions to discover finer steganographic distortions to enhance the detection effect, where a Group-Squeeze-Weighting block is introduced to alleviate the information loss in the process of increasing the feature dimension. In addition, we design a temporal fusion module to extract pitch temporal features using the stacked LSTM, where a Gated Feed-Forward Network is introduced to learn the interaction between different feature maps while suppressing the features that are not useful for detection. We evaluated the performance of STFF-SM through comprehensive experiments and comparisons with the state-of-the-art solutions. The experimental results demonstrate that STFF-SM can well meet the needs of real-time detection of speech steganography in VoIP streams, and outperforms the existing methods in detection performance, especially with low embedding strengths and short window sizes.
引用
收藏
页码:277 / 289
页数:13
相关论文
共 50 条
  • [31] DCA FEATURE FUSION TERRAIN CLASSIFICATION BASED ON SPATIAL PYRAMID MODEL
    Li C.
    Xu L.
    Wang L.
    Wang H.
    Shen T.
    Taiyangneng Xuebao/Acta Energiae Solaris Sinica, 2023, 44 (09): : 334 - 339
  • [32] Locating Steganalysis of LSB Matching Based on Spatial and Wavelet Filter Fusion
    Yang, Chunfang
    Wang, Jie
    Lin, Chengliang
    Chen, Huiqin
    Wang, Wenjuan
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 60 (02): : 633 - 644
  • [33] A Spatial-Temporal Feature Fusion Strategy for Skeleton-Based Action Recognition
    Chen, Yitian
    Xu, Yuchen
    Xie, Qianglai
    Xiong, Lei
    Yao, Leiyue
    2023 INTERNATIONAL CONFERENCE ON DATA SECURITY AND PRIVACY PROTECTION, DSPP, 2023, : 207 - 215
  • [34] Multimodal motor imagery decoding method based on temporal spatial feature alignment and fusion
    Zhang, Yukun
    Qiu, Shuang
    He, Huiguang
    JOURNAL OF NEURAL ENGINEERING, 2023, 20 (02)
  • [35] Weakly supervised video anomaly detection based on spatial–temporal feature fusion enhancement
    Weijie Liang
    Jianming Zhang
    Yongzhao Zhan
    Signal, Image and Video Processing, 2024, 18 : 1111 - 1118
  • [36] Malicious Traffic Detection Algorithm for the Internet of Things Based on Temporal Spatial Feature Fusion
    Zhang, Linzhong
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (11) : 646 - 656
  • [37] Steganalysis of AMR Speech Stream Based on Multi-Domain Information Fusion
    Guo, Chuanpeng
    Yang, Wei
    Huang, Liusheng
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4077 - 4090
  • [38] Spatial-Frequency Feature Fusion Network for Lightweight and Arbitrary-Sized JPEG Steganalysis
    Liu, Xulong
    Li, Weixiang
    Lin, Kaiqing
    Li, Bin
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2585 - 2589
  • [39] Spatial-Temporal Feature Fusion for Human Fall Detection
    Ma, Xin
    Wang, Haibo
    Xue, Bingxia
    Li, Yibin
    COMPUTER VISION, CCCV 2015, PT I, 2015, 546 : 438 - 447
  • [40] Alpha-numeric hand gesture recognition based on fusion of spatial feature modelling and temporal feature modelling
    Yang, C.
    Ku, B.
    Han, D. K.
    Ko, H.
    ELECTRONICS LETTERS, 2016, 52 (20) : 1679 - 1680