Purpose - The aim of this paper is to document how two university libraries determined whether mystery shopping is an effective and statistically feasible instrument for evaluating customer service at public service desks. Design/methodology/approach - Mystery shopping exercises were conducted at both libraries during the 2008 spring and fall semesters. Trained mystery shoppers recorded staff behaviors and the answers given to their reference questions and open-ended comments about their reference experience. Using ClinTools, Excel, and Atlas. ti, the authors conducted a meta-analysis of the data. Findings - Mystery shopping is an effective method for evaluating customer service in libraries. The shoppers observed staff behaviors that were generally in line with the libraries' guidelines, but their comments revealed suggestions for improvement. When the behavior rubric results were combined with the comments, the authors learned that shoppers were somewhat unsatisfied. Research limitations/implications - The results are approximate since the two instruments used were not identical, requiring the combination of common elements with some loss of accuracy. In this study, the authors used meta-analysis to compensate for the differences in the instruments. However, another solution would be to create one instrument for both institutions that contained common elements for inter library comparison and local elements for local customization. Practical implications - Other libraries can adapt this mystery shopping methodology and data analysis to measure customer service in their libraries. Originality/value - No other study of mystery shopping has included the questionnaires used at both institutions, the aggregated data, and the method of analysis for meaningful evaluation.