Resistance spot welding is an important process in the production of body-in-white. The quality of the welded nugget affects the safety performance of the whole vehicle. Currently, the quality of the welded nugget is mainly inspected manually, which is labor-intensive and inefficient. Therefore, this paper explores a new method to automatically and efficiently detect the quality of welded nugget by analyzing the vibration excitation response signal of the welded joint. In response to the characteristics of large volume, significant noise, and small discriminability of raw signals, we constructed a deep learning model named Real Spatial-temporal Attention Denoising Network (RSTADN), which consists of a denoising module, spatial-temporal attention modules, and multiple residual modules. The denoising module uses global absolute average pooling (GAAP) to maximize the retention of the original signal characteristics while aggregating global information from each channel. It generates appropriate soft thresholds to eliminate noise and enhance the feature recognition ability of the model. The spatial-temporal attention modules delve into the spatiotemporal correlation features of the signal from different perspectives, including real spatial scale, short-term dependency, and global temporal interaction, to enhance the model's feature extraction ability. Multiple residual modules further extract signal features to achieve precise alignment between features and nugget quality states. The experimental results indicate that the accuracy of RSTADN in the task of detecting the quality of welded nugget reaches as high as 94.35%, which is at least 1.31% higher than that of existing models.