The main objective of unsupervised anomalous sound detection (ASD) is to identify anomalous sound events among normal sound samples. Existing ASD methods primarily rely on generative and discriminative models. Among them, the autoencoder (AE) based on generative models is widely used for anomaly detection. However, due to the 'shortcut' problem, it often misclassifies abnormal samples as normal. In contrast, discriminative model-based methods, while exhibiting good performance, often suffer from poor stability. This research introduces an architecture named the self-supervised classification deep hierarchical reconstruction network (SCDHR), which combines generative and discriminative model structures. The system uses convolutional kernels of varying sizes across different branches to process input data, aiming to extract more discriminative features. Additionally, a module called symmetric fusion attention (SFA) is introduced. This module enhances the model's ability to process input by integrating attention mechanisms for time, frequency, and coordinate across different branches. As a result, the model's ability to select relevant features is improved. Furthermore, the one class center loss is incorporated and combined with the standard center loss to obtain more compact feature representations, thereby enhancing the model's ability to distinguish anomalous samples. Finally, the proposed method is validated on the DCASE 2023 TASK 2 dataset, achieving a harmonic mean of 65.17% for AUC and PAUC on the Development Dataset and 68.16% on the Evaluation Dataset, outperforming state-of-the-art methods.