Accurately determining Direction of Arrival (DoA) is pivotal for various applications such as wireless communication, radar, and sensor arrays, where precise spatial localization is crucial in enhancing system performance and overall efficiency. Low signal-to-noise ratio (SNR) and limited number of snapshots pose formidable challenges to accurate DoA estimation. Both conventional model-based techniques and recent deep learning (DL) based DoA estimation models that map sample covariance matrices to DoA spectrum estimations struggle in such environments. In this study, we introduce a comprehensive DL framework that leverages sample covariance as input to predict the corresponding DoA jointly with the estimation of the true covariance matrix. The proposed architecture comprises two main components that employ Convolutional Neural Networks (CNN). The first part focuses on covariance reconstruction, aligning with the true covariance of a specific sample, and the second part applies multi-label classification for the DOA estimation step. Distinct from employing only Binary Cross-Entropy (BCE) loss for the previous on-grid CNN approaches, our study implements a holistic training strategy incorporating three individual loss terms into one novel combined loss function. The proposed overall framework integrates the Mean Squared Error (MSE) loss for the true covariance matrix reconstruction, to enhance model performance, particularly in low SNR and snapshot number scenarios, coupled with the BCE and MSE losses for angle estimation. This strategic combination demonstrates improved robustness and performance compared to existing CNN-based approaches.