Mosquito-borne diseases affect over 3 billion people worldwide, resulting in more than 600,000 fatalities annually. The ability to precisely identify mosquito species is essential for predicting disease outbreaks, yet it is hindered by manual, subjective methods that demand specialized skills and hardware, restricting monitoring scalability. Climate change further aggravates this challenge by altering habitats. Citizen science, using smartphone-captured mosquito images, has the potential for scalable mosquito identification and tracking. However, smartphone images gathered by citizen scientists can contain varied backgrounds, multiple mosquitoes, or damaged mosquitoes, challenging classic image classification. Our approach employs object detection and classification to identify mosquito types from varied smartphone photos accurately. This study's contributions include converting an image classification dataset into an object detection dataset with precise bounding boxes and breed classification, alongside utilizing two distinct datasets for model training and testing to establish the model's generalizability. Our training dataset comprises 10,000 images, and we focus on the two most represented categories, Aedes albopictus and Culex quinquefasciatus, which are key vectors for diseases like Dengue, Zika, and West Nile virus. This dataset originates from the Mosquito Alert project, a collaborative citizen science initiative that engages volunteers to capture and submit mosquito photographs. These contributors come from a diverse geographical span, including Spain, the Netherlands, Italy, and Hungary, contributing to the dataset over a period from 2014 to 2022. The YOLOv8 model was optimized through hyperparameter tuning using the aforementioned dataset. This process resulted in the model achieving over 98% mean Average Precision (mAP) at 50% IoU (Intersection over Union) during validation. Subsequently, the refined model was evaluated using a secondary dataset (the Mosquito on Human Skin or MOHS) gathered by Malaysian researchers. In this testing phase, the model demonstrated a high accuracy, achieving a mAP50 value of 99%, indicating superior performance in identifying mosquito species from varied image inputs. Furthermore, cross-testing demonstrated that both models could generate mAP50 of over 98% on the other dataset, showing the generalizability of our approach. Comparisons with baselines using a ResNet101 model for image classification show that our approach outperforms a standard neural network approach for the real-world MosquitoAlert dataset and enables a generalization from the lab-based dataset not possible via the classification baseline.