The Emergency Siren Recognition (ESR) constitutes a relevant task in audio processing, which can be useful to the development of driver assistance systems or wearable devices to generate alternative alerts that make the user aware of emergency signals in the vicinity. In this context, an effective ESR solution involves deploying the model on a resource-constrained device, in which aspects such as energy consumption and memory size are critical. In this work, we studied and applied two state-of-the-art deep learning architectures to ESR: a modified version of the well-known GhostNet model, and an end-to-end 1D convolutional neural network (CNN). The performance in classification as well as computational metrics considering a low-power device (STM32F407VGT6) implementation were assessed. Distinct sampling rates, signal lengths and representations of sound were tested, considering three publicly available datasets: ESC-50, US8K and AudioSet. The best performing model on ESC-50 was GhostNet, achieving an F-Score of 0.96 +/- 0.01, with a Multiply-and-Accumulate Complexity (MACC) of 0.517M, whereas the 1D CNN obtained the best F-Score on US8K (0.93 +/- 0.05), with an MACC of 27.125M. Additionally, we verified that 32 filters log-Mel spectrograms on 1.5-s long signals sampled at 16000 Hz led to the best performances. Interestingly, the most efficient model was GhostNet, trained using 32 filters MFCCs, 1-s long signals with a sampling rate of 8820 Hz, which achieved an F-Score of 0.92 +/- 0.07 on ESC-50, just 0.04 below the best overall performance, but with a 33% lower MACC (0.347M) and 40% less running time.