This study introduces an advanced reinforcement learning (RL)-based control strategy for heating, ventilation, and air conditioning (HVAC) systems, employing a soft actor-critic agent with a customized reward mechanism. This strategy integrates time-varying outdoor temperature-dependent weighting factors to dynamically balance thermal comfort and energy efficiency. Our methodology has undergone rigorous evaluation across two distinct test cases within the building optimization testing (BOPTEST) framework, an open-source virtual simulator equipped with standardized key performance indicators (KPIs) for performance assessment. Each test case is strategically selected to represent distinct building typologies, climatic conditions, and HVAC system complexities, ensuring a thorough evaluation of our method across diverse settings. The first test case is a heating-focused scenario in a residential setting. Here, we directly compare our method against four advanced control strategies: an optimized rule-based controller inherently provided by BOPTEST, two sophisticated RL-based strategies leveraging BOPTEST's KPIs as reward references, and a model predictive control (MPC)-based approach specifically tailored for the test case. Our results indicate that our approach outperforms the rule- based and other RL-based strategies and achieves outcomes comparable to the MPC-based controller. The second scenario, a cooling-dominated environment in an office setting, further validates the versatility of our strategy under varying conditions. The consistent performance of our strategy across both scenarios underscores its potential as a robust tool for smart building management, adaptable to both residential and office environments under different climatic challenges. Impact Statement Worldwide, heating, ventilation, and air conditioning (HVAC) systems in buildings account for substantial energy consumption and emissions. They also often contribute to the peak load in buildings causing stress on electricity infrastructure. To meet the demands of HVAC systems while optimizing energy use and efficiency, advanced control strategies are essential. However, traditional control methods, such as rule-based and model- based approaches, often face challenges like extensive model development, slowing their adoption in the industry. In this context, reinforcement learning (RL) has emerged as a promising, model-free solution. Despite its potential, limited research has specifically tailored RL for HVAC control, taking into account the unique characteristics and requirements of these systems while demonstrating its practical application across diverse scenarios. To address these gaps, we have developed an environment-adaptive, single-agent RL control method, showcasing its effectiveness across different climates and building types. This work offers a valuable contribution to the growing body of literature on RL-based control methods for HVAC systems.