Accurate pre-harvest yield estimation facilitates the more rational allocation of resources. Previous methods of estimating fruit position, either directly from hardware or through Structure from Motion (SFM), encounter challenges such as high hardware costs, extensive computational resources, and repetitive counting. In this paper, we present a graph optimization-based system that tightly couples dual-frequency GNSS raw measurements and visual-inertial data to estimate pitaya state, including its position and radius. Firstly, the system utilizes a coarse-to-fine approach to estimate the transformations on three axes and align three complementary sensors to a unified coordinate system, thus reducing the computational complexity of pure visual navigation. Secondly, for the first time, this system analytically calculates the pitaya state using the single-view measurement output from the binocular camera, replacing expensive lidar. It employs this calculated state as the initial value, then optimizes the pitaya state and eliminates outliers by multiple reprojection residuals from multi-view measurements. Next, for the first time, we integrate ionospheric- free pseudorange and Doppler residuals into the visual-inertial factor graph, thereby mitigating multipath interference in double-layer film greenhouse and providing a unique ID and position for each pitaya. Finally, we calculate the weight of each pitaya using a cubic polynomial based on its radius and estimate the overall greenhouse yield by summing all individual weights. In greenhouse experiments, our system achieved an Root Mean Square Error (RMSE) of 7 cm for position estimation, 7 mm for radius estimation, and a weight Mean Absolute Percentage Error (MAPE) of about 16% for individual pitaya. Overall, our system can provide costeffective, real-time, non-repetitive, and low-missing pitaya yield estimation to assist greenhouse managers in decision-making.