Tomato leaf disease detection presents several challenges, including the similarity and variability of disease features, multi-scale changes, and interference from complex backgrounds. To address these, we propose a novel detection model, CAPNet, based on adaptive feature fusion and convolution enhancement. First, we introduce the Context-Guided Fusion Module (CGFM), which applies a channel attention mechanism to adaptively weight and fuse feature maps. By leveraging global average pooling and fully connected layers, CGFM captures inter-channel dependencies, enhancing feature expression and improving the model's ability to handle intra-class variability and inter-class similarity. Additionally, we integrate the Precise Feature Enhancement Module (PFEM) into the backbone network, which effectively distinguishes disease features from background noise through multi-level feature fusion and innovative convolution operations, strengthening the model's detection capabilities in complex environments. Finally, we address the limitations of the YOLOv8 detection head in handling multi-scale disease features by introducing the Adaptive Receptive Field Attention Head (ARFAhead). This head combines spatial attention with adaptive receptive fields, allowing the receptive field size to be adjusted dynamically, thereby overcoming the limitations of fixed receptive fields in handling scale variations. Experiments on the Tomato Leaf Disease Dataset (TLDD), encompassing 9 disease types and 25,854 instances, demonstrate that CAPNet improves mAP0.5 by 5.4% and mAP@[0.5:0.95] by 5.1% over the baseline model. Furthermore, CAPNet outperforms state-of-the-art models in detection performance. When tested on the PlantDoc and DDCVP dataset, CAPNet improved mAP@0.5 by 5% and 1.7% over the baseline, demonstrating its superior generalization capability. These findings confirm CAPNet's superior performance in complex natural environments, offering robust technical support for disease management in modern tomato cultivation.