The topic of object detection (OD) in remote sensing (RS) has received a lot of attention due to the rapid growth of deep learning. However, RS images typically have the following characteristics: significant variations in object scales, tight arrangement of small objects, and indistinguishable feature boundaries between objects and backgrounds. These challenges lead to defects, such as insufficient feature extraction and information loss of the existing methods. To address the above issues, based on the YOLOv7 architecture, we present a novel OD method named GDRS-you only look once (YOLO). Our primary contributions include: first, an enhanced feature extraction network based on deformable convolution is proposed to improve the network's ability to model geometric transformations. Second, we abandoned the traditional feature pyramid architecture and construct a multiscale feature aggregation network based on the gather-and-distribute mechanism, which makes effective use of the feature obtained from the backbone and reduces the loss of information in the transmission process. Finally, the normalized Wasserstein distance (NWD) is introduced for hybrid loss training, which alleviates the sensitivity of the IoU-based metric to the location deviation of tiny objects. We demonstrate the effectiveness of GDRS-YOLO on the publicly available datasets NWPU VHR-10 and VisDrone datasets. Compared to the original YOLOv7, the proposed method improves the mean average precision (mAP) by 1.9% and 5.5%, respectively. These results emphasize the superior performance of the proposed model, which provides an efficient multiscale feature fusion solution for RS applications.