Anomaly detection in dynamic scenarios, such as fighting detection within skeleton data, remains a challenging task due to diverse motion patterns and environmental factors. In this study, a novel approach employing a hybrid transformer-based variational autoencoder framework tailored for analyzing skeleton datasets is introduced. By extracting special features capturing joint velocities and differences, the model gains insight into the dynamics of mutual interactions. Notably, a dynamic thresholding technique is employed to adaptively detect anomalies, enhancing the model's adaptability and resilience to varying data conditions. By leveraging the mutual action of skeleton data, our method effectively distinguishes between fighting and non-fighting activities. Unlike conventional reconstruction-based methods or future frame prediction techniques, our model integrates transformer architectures with variational autoencoder principles and an anomaly scoring method. This innovative combination addresses the limitations of existing approaches, offering improved anomaly detection capabilities. The model's output includes encoded representations, decoded outputs, and anomaly scores, facilitating straightforward detection of fighting and non-fighting classes. Experimental results demonstrate the robustness and effectiveness of the methodology, achieving performance scores of 72.6% on the NTU-60 dataset and 76.8% on theNTU-120 mutual action skeleton dataset in accurately discerning anomalous behaviors, particularly in the context of mutual actions.