STFF-SM: Steganalysis Model Based on Spatial and Temporal Feature Fusion for Speech Streams

被引:4
|
作者
Tian, Hui [1 ,2 ]
Qiu, Yiqin [1 ,2 ]
Mazurczyk, Wojciech [3 ]
Li, Haizhou [4 ,5 ]
Qian, Zhenxing [6 ]
机构
[1] Natl Huaqiao Univ, Coll Comp Sci & Technol, Xiamen 361021, Peoples R China
[2] Xiamen Key Lab Data Secur & Blockchain Technol, Xiamen 361021, Peoples R China
[3] Warsaw Univ Technol, Fac Elect & Informat Technol, Inst Comp Sci, PL-00665 Warsaw, Poland
[4] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Peoples R China
[5] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[6] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Delays; Feature extraction; Steganography; Quantization (signal); Distortion; Speech coding; Resistance; Steganalysis; steganography; voice over Internet protocol; speech streams; deep neural networks; pitch delays; STEGANOGRAPHY; SCHEME; VOICE;
D O I
10.1109/TASLP.2022.3224295
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The real-time detection of speech steganography in Voice-over-Internet-Protocol (VoIP) scenarios remains an open problem, as it requires steganalysis methods to perform for low-intensity embeddings and short-sample inputs, as well as provide rapid detection results. To address these challenges, this paper presents a novel steganalysis model based on spatial and temporal feature fusion (STFF-SM). Differing from the existing methods, we take both the integer and fractional pitch delays as input, and design subframe-stitch module to organically integrate subframe-wise integer delays and frame-wise fractional pitch delays. Further, we design a spatial fusion module based on pre-activation residual convolution to extract the pitch spatial features and gradually increase their dimensions to discover finer steganographic distortions to enhance the detection effect, where a Group-Squeeze-Weighting block is introduced to alleviate the information loss in the process of increasing the feature dimension. In addition, we design a temporal fusion module to extract pitch temporal features using the stacked LSTM, where a Gated Feed-Forward Network is introduced to learn the interaction between different feature maps while suppressing the features that are not useful for detection. We evaluated the performance of STFF-SM through comprehensive experiments and comparisons with the state-of-the-art solutions. The experimental results demonstrate that STFF-SM can well meet the needs of real-time detection of speech steganography in VoIP streams, and outperforms the existing methods in detection performance, especially with low embedding strengths and short window sizes.
引用
收藏
页码:277 / 289
页数:13
相关论文
共 50 条
  • [21] Depression detection based on the temporal-spatial-frequency feature fusion of EEG
    Xi, Yang
    Chen, Ying
    Meng, Tianyu
    Lan, Zhu
    Zhang, Lu
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100
  • [22] Fast projections of spatial rich model feature for digital image steganalysis
    Pengfei Wang
    Zhihui Wei
    Liang Xiao
    Soft Computing, 2017, 21 : 3335 - 3343
  • [23] DNA Steganalysis Based on Multi-dimensional Feature Extraction and Fusion
    Wang, Zhuang
    Xia, Jinyi
    Huang, Kaibo
    Guo, Shengnan
    Huang, Chenwei
    Yang, Zhongliang
    Zhou, Linna
    DIGITAL FORENSICS AND WATERMARKING, IWDW 2023, 2024, 14511 : 277 - 291
  • [24] An Image Steganalysis Algorithm Based on Multi-Resolution Feature Fusion
    Wu, Zhiqiang
    Wan, Shuhui
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2024, 18 (01)
  • [25] STFT: Spatial and temporal feature fusion for transformer tracker
    Zhang, Hao
    Piao, Yan
    Qi, Nan
    IET COMPUTER VISION, 2024, 18 (01) : 165 - 176
  • [26] SUPERPIXEL BASED SPATIAL AND TEMPORAL ADAPTIVE REFLECTANCE FUSION MODEL
    Wang, Wei
    Sun, Genyun
    Yao, Yanjuan
    Zhang, Aizhu
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 2308 - 2311
  • [27] A NEW SPATIAL AND TEMPORAL FUSION MODEL
    Wang, Jing
    Huang, Bo
    XXIII ISPRS CONGRESS, COMMISSION VII, 2016, 3 (07): : 203 - 206
  • [28] A model for spatial and temporal data fusion
    Wu Ming-Quan
    Wang Jie
    Niu Zheng
    Zhao Yong-Qing
    Wang Chang-Yao
    JOURNAL OF INFRARED AND MILLIMETER WAVES, 2012, 31 (01) : 80 - 84
  • [29] LGTCN: A Spatial-Temporal Traffic Flow Prediction Model Based on Local-Global Feature Fusion Temporal Convolutional Network
    Ye, Wei
    Kuang, Haoxuan
    Deng, Kunxiang
    Zhang, Dongran
    Li, Jun
    APPLIED SCIENCES-BASEL, 2024, 14 (19):
  • [30] Speech Emotion Recognition Based on Feature Fusion
    Shen, Qi
    Chen, Guanggen
    Chang, Lin
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, MACHINERY AND ENERGY ENGINEERING (MSMEE 2017), 2017, 123 : 1071 - 1074