Tower cranes are essential ground facilities in the process of urban construction and play a crucial role in site safety and urbanization development. After comparing various algorithms from the YOLO series, we propose a novel network based on the YOLOv5 network structure to detect tower-crane objects from remote sensing images (Google Earth images). Our modifications to the YOLOv5x network encompass utilizing all the features extracted by the backbone network, improving the feature fusion network, and introducing a channel attention mechanism module. To test the performance of the model, we created a tower crane object detection dataset to train the model and selected tower crane areas in Jiangsu Province, China, as the test data. According to the results obtained from this evaluation, our method achieved an average precision (AP) of 77.41%. Compared with the original YOLOv5x, our approach demonstrated improvements in accuracy (1.03%), recall (2.48%), F1 score (2.87%), and AP (3.10%). Additionally, we conducted comparative experiments involving popular one-step detection algorithms. Compared with YOLOv3, YOLOv5, YOLOv7x, and YOLOv8x, the APs of our method are improved by 8.06%, 11.57%, 18.47%, and 6.66%, respectively, indicating promising potential for future tower crane inspections. We also conducted an in-depth analysis of tower crane distribution in Kunshan City, Jiangsu Province, to verify our theory.