Short-term residential load forecasting (STRLF) is critical for the safe and stable operation of the microgrid system. Due to shred conditions such as temperature and holiday impacts, households in the same region may exhibit similar consumption patterns. However, existing STRLF methods focus mainly on exploring the temporal patterns of a single household; the spatial correlations between multiple households are generally ignored. To address this challenge, a spatial and temporal attention-enabled transformer model, STformer, is proposed to extract the dynamic spatial and nonlinear temporal correlations between residential units and perform joint predictions of multivariate residential loads. The combination of improved temporal attention and spatial attention mechanisms allows the proposed method to capture complex spatial and temporal factors without prior geographical information. The Monte Carlo (MC) dropout method is utilized to further extend the proposed model to multitask residential probabilistic load forecasting. Compared to Transformer, the proposed model improves the point forecast accuracy of individual New York (NY), USA, and Los Angeles (LA), USA, by 16.54% and 6.95%, and the combined point forecast accuracy by 22.46% and 11.86%, respectively. In addition, the proposed model improved the residential probabilistic load prediction accuracy by 10.21% and 11.07% in NY and LA, respectively, compared to SGPR.