To effectively predict generation of distributed photovoltaic (PV) systems, three parameters are critical: irradiance, ambient temperature, and module temperature. However, their completeness cannot be guaranteed because of issues in data acquisition. Many methods in literature address missingness, but their applicability varies with missingness mechanism. Exploration of methods to impute missing data in PV systems is lacking. This paper conducts statistical analyses to understand missingness mechanism in data of a real grid-tied 1.4MW PV system at Miami, and compares the imputation performance of different methods: random imputation, multiple imputation using expectation-maximization, kNN, and random forests, using error metrics and size effect measures. Imputed values are used in a multilayer perceptron to predict and compare PV generation with observed values. Results show that values imputed using kNN and random forests have the least differences in proportions and help utilities make more accurate prediction of generation for distribution planning.