In this article, we propose an effective infrared and visible image fusion network based on contrastive learning, which is called CLF-Net. A novel noise contrastive estimation framework is introduced into the image fusion to maximize the mutual information between the fused image and source images. First, an unsupervised contrastive learning framework is constructed to promote fused images selectively retaining the most similar features in local areas of different source images. Second, we design a robust contrastive loss based on the deep representations of images, combining with the structural similarity loss to effectively guide the network in extracting and reconstructing features. Specifically, based on the deep representation similarities and structural similarities between the fused image and source images, the loss functions can guide the feature extraction network in adaptively obtaining the salient targets of infrared images and background textures of visible images. Then, the features are reconstructed in the most appropriate manner. In addition, our method is an unsupervised end-to-end model. All of our methods have been tested on public datasets. Based on extensive qualitative and quantitative analysis results, it has been demonstrated that our proposed method performs better than the existing state-of-the-art fusion methods. Our code is publicly available at https://github.com/zzj-dyj/ CLF-Net