In this study, a reinforcement learning (RL) algorithm is utilized within the energy management system (EMS) for battery energy storage systems (BESs) within a multilevel microgrid. This microgrid seamlessly integrates photovoltaic (PV) plants and wind turbines (WT), employing a multilevel configuration based on battery energystored quasi-Z-source cascaded H-bridge multilevel inverter (BES-qZS-CHBMLIs). Twin-delayed deep deterministic (TD3) policy gradient agent is implemented as an RL agent to dispatch power between the BES to meet the requested grid power while considering the BES efficiency and lifetime. Two 4.8 kW PV plants and a 5 kW WT, integrating BES with different rated capacities, are connected to the grid through a BES-qZS-CHBMLI configuration, and the resulting microgrid is simulated in MATLAB to evaluate the proposed RL-EMS performance. Moreover, a SOC-EMS, a fuzzy logic EMS (FL-EMS), and two nonlinear algorithm-based (PSO and fmincon) EMSs are implemented to compare the results with those obtained by the RL-EMS. The comparison demonstrates the superior performance of the RL-based EMS over other methods, with improvements of up to 18.09 % in the integral time absolute error (ITAE) for the active power, 17.77 % in the ITAE for the reactive power, and 21.38 % in the standard deviation (STD) for the active power compared to the other EMSs based on SOC, fuzzy, fmincon, and PSO. Additionally, the fmincon-EMS shows a notable improvement over other methods, achieving up to 15.12 % better performance in power demand tracking and BES dispatch. In a dynamic environment with fluctuating power production and demand, the trained RL system effectively optimizes the power injection or storage between BESs while maintaining grid demand and battery SOC balance.