Monocular depth estimation (MDE) is a critical computer vision task that enhances environmental perception in fields such as autonomous driving and robot navigation. In recent years, deep learning-based MDE methods have achieved notable progress in these fields. However, achieving robust monocular depth estimation in low-altitude forest environments remains challenging, particularly in scenes with dense and cluttered foliage, which complicates applications in environmental monitoring, agriculture, and search and rescue operations. This paper presents a comprehensive evaluation of state-of-the-art deep learning-based MDE methods on low-altitude forest datasets. The evaluated models include both self-supervised and supervised approaches, employing different network structures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs). We assessed the generalization of these approaches across diverse low-altitude scenarios, specifically focusing on forested environments. A systematic set of evaluation criteria is employed, comprising traditional image-based global statistical metrics as well as geometry-aware metrics, to provide a more comprehensive evaluation of depth estimation performance. The results indicate that most Transformer-based models, such as DepthAnything and Metric3D, outperform traditional CNN-based models in complex forest environments by capturing detailed tree structures and depth discontinuities. Conversely, CNN-based models like MiDas and Adabins struggle with handling depth discontinuities and complex occlusions, yielding less detailed predictions. On the Mid-Air dataset, the Transformer-based DepthAnything demonstrates a 54.2% improvement in RMSE for the global error metric compared to the CNN-based Adabins. On the LOBDM dataset, the CNN-based MiDas has the depth edge completeness error of 93.361, while the Transformer-based Metric3D demonstrates the significantly lower error of only 5.494. These findings highlight the potential of Transformer-based approaches for monocular depth estimation in low-altitude forest environments, with implications for high-throughput plant phenotyping, environmental monitoring, and other forest-specific applications.