With the recent demand for decarbonization and energy efficiency, advanced HVAC control using Deep Reinforcement Learning (DRL) becomes a promising solution. Due to its flexible structures, DRL has been successful in energy reduction for many HVAC systems. However, only a few researches applied DRL agents to manage the entire central HVAC system and control multiple components in both the water loop and the air loop, owing to its complex system structures. Moreover, those researches have not extended their applications by incorporating the indoor air quality, especially both CO2 and PM2.5concentrations, on top of energy saving and thermal comfort, as achieving those objectives simultaneously can cause multiple control conflicts. What's more, DRL agents are usually trained on the simulation environment before deployment, so another challenge is to develop an accurate but relatively simple simulator. Therefore, we propose a DRL algorithm for a central HVAC system to co-optimize energy consumption, thermal comfort, indoor CO2 level, and indoor PM2.5 level in an office building. To train the controller, we also developed a hybrid simulator that decoupled the complex system into multiple simulation models, which are calibrated separately using laboratory test data. The hybrid simulator combined the dynamics of the HVAC system, the building envelope, as well as moisture, CO2, and particulate matter transfer. Three control algorithms (rule-based, MPC, and DRL) are developed, and their performances are evaluated on the hybrid simulator environment with a realistic scenario (i.e., with stochastic noises). The test results showed that, the DRL controller can save 21.4 % of energy compared to a rule-based controller, and has improved thermal comfort, reduced indoor CO2 concentration. The MPC controller showed an 18.6 % energy saving compared to the DRL controller, mainly due to savings from comfort and indoor air quality boundary violations caused by unmeasured disturbances, and it also highlights computational challenges in real-time control due to non-linear optimization. Finally, we provide the practical considerations for designing and implementing the DRL and MPC controllers based on their respective pros and cons.