While most state-of-the-art building extraction methods can generate precise binary segmentation masks, geographic and cartographic applications typically require vectorized footprints of the extracted building instead of the rasterized output. Current vectorized footprint extraction methods for multiple buildings have yet to address the confusion brought about by the broken line formed at the slope connection of the building roof. In addition, these methods have some shape errors, such as incomplete and irregular boundaries. Given the above issues, this study proposed a new building outline extraction method, SANET, which uses a transformer block to capture refined boundaries and footprint features. Moreover, a shape-aware loss function was designed to constrain the building shapes and optimize boundary feature generation. Incomplete and irregular boundaries and intersecting outlines aggravate the shape errors for vectorized building objects. Thus, this study computed a Fourier descriptor for a footprint generation model to provide prior shape knowledge for the shape constraint module. Experiments were conducted on the WHU and SpaceNet datasets, and the proposed method could achieve state-of-the-art performance with a higher average precision (AP) and recall compared with other contour-based methods. The proposed shape constraint methods obtained complete and shape-correct building boundaries. Moreover, the incomplete outlines and smooth corners were remarkably improved.