Desert encroachment significantly threatens the living and activity space of humanity, and undertaking human-directed vegetation restoration is one of the effective ways to prevent desert expansion. In the process of desert vegetation restoration, counting the number of tree saplings for rapidly assessing the survival rate of vegetation (such as Haloxylon ammodendron) is a critical task within the restoration process. However, traditional ground-based statistical methods are resource-intensive and time-consuming. This paper proposed a novel unsupervised fine segmentation framework driven by Grounding DINO prompt generation and optimization segment anything model, termed GDPGO-SAM, designed for the segmentation of desert vegetation from UAV-derived remote sensing imagery, thereby facilitating the rapid inventory of tree saplings counts. The framework combines the Grounding DINO object detector and the pre-trained visual model SAM, employing a task-prior-based prompt optimization mechanism to effectively capture the innate features of desert vegetation. This method achieves zero-sample instance segmentation of desert vegetation with an overall accuracy (OA) of 96.56%, a mean Intersection over Union (mIoU) of 81.50%, and a kappa coefficient (kappa) of 0.782, successfully overcoming the limitations of traditional supervised models that rely on passive memorization rather than true recognition. This research significantly enhances the precision of vegetation extraction and canopy depiction, providing strong support for the management of desert vegetation restoration and combating desert expansion.