Images captured in the low-light condition suffer from low visibility and various imaging artifacts, e.g., real noise. Existing supervised algorithms for low-light image enhancement require a large set of pixel-aligned training image pairs, which are hard to prepare in practice. Though some recent unsupervised methods can alleviate such data challenges, many real world artifacts inevitably get falsely amplified in the enhanced results due to the lack of corresponding supervision. In this paper, instead of using perfectly aligned images for training, we creatively employ the misaligned real world images as the guidance, which are considerably easier to collect. Specifically, we propose a Cross-Image Disentanglement Network (CIDN) with weakly supervised learning, to separately extract cross-image brightness and image-specific content features from low/normal-light images. Based on that, CIDN can simultaneously correct the brightness and suppress image artifacts in the feature domain, which largely increases the robustness of the pixel shifts between training pairs. By considering real world corruptions, we propose a new training dataset with misaligned and noisy image pairs and its corresponding evaluation dataset. Experimental results show that our model achieves state-of-the-art performances on both the newly proposed dataset and other popular low-light datasets. The code implementation is publicly available at: https://github.com/GuoLanqing/CIDN.