The conventional process of generating medical radiology reports is labor-intensive and time-consuming, requiring radiologists to describe findings meticulously from imaging studies. This manual approach often causes undesirable delays in patient care. Despite advancements in computer vision and deep learning, developing an effective computer-aided solution to generate automated medical reports remains challenging. The recent advancements in deep learning technology, especially with the advent of contrastive learning, have shown significant performance in natural language supervision. However, their application to medical report generation, particularly in the domain of chest x-rays (CXR), has been limited due to the lack of large annotated datasets. Many studies have proposed multimodal contrastive learning schemes to address the data scarcity problem for natural images. However, none of these techniques have been efficiently explored in terms of medical report generation. This study addresses these challenges by proposing a dual contrastive learning network (DuCo-Net) containing backbone and augmented networks. The backbone network is trained on the original data, while the augmented network emphasizes cross-model augmentation learning in a unified framework. DuCo-Net enables two complementary learning mechanisms: intra-modal learning, where each network learns specialized features within its modality (either image or text), and inter-modal learning, which captures relationships between image and text modalities through a combined loss function. This dual learning approach leverages modified DenseNet121 and BioBERT models with advanced pooling techniques specifically tailored for handling medical data. Comprehensive evaluations on two publicly available datasets demonstrate that DuCo-Net significantly outperforms current benchmarks. On the Indiana University Chest X-rays dataset, our proposed methodology demonstrates significant improvements across standard metrics (BLEU-1: 0.50, ROUGE: 0.40, METEOR: 0.24, F1: 0.40). For the MIMIC-CXR dataset, the framework maintains robust performance (BLEU-1: 0.42, ROUGE: 0.34, METEOR: 0.20, F1: 0.34), representing substantial improvements over existing state-of-the-art approaches in medical report generation.