Moving object detection methods, MOD, must solve complex situations found in video scenarios related to bootstrapping, illumination changes, bad weather, PTZ, intermittent objects, color camouflage, camera jittering, low camera frame rate, noisy videos, shadows, thermal videos, night videos, etc. Some of the most promising MOD methods are based on convolutional neural networks, which are among the best-ranked algorithms in the CDnet14 dataset. Therefore, this paper presents a novel CNN to detect moving objects called Two-Frame CNN, 2FraCNN. Unlike best-ranked algorithms in CDnet14, 2FraCNN is a non-transfer learning model and employs temporal information to estimate the motion of moving objects. The architecture of 2FraCNN is inspired by how the optical flow helps to estimate motion, and its core is the FlowNet architecture. 2FraCNN processes temporal information through the concatenation of two consecutive frames and an Encoder-Decoder architecture. 2FraCNN includes a novel training scheme to deal with unbalanced pixel classes background/foreground. 2FraCNN was evaluated using three different schemes: the CDnet14 benchmark for a state-of-the-art comparison; against human performance metric intervals for a realistic evaluation; and for practical purposes with the performance instrument PVADN that considers the quantitative criteria of performance, speed, auto-adaptability, documentation, and novelty. Findings show that 2FraCNN has a performance comparable to the top ten algorithms in CDnet14 and is one of the best twelve in the PVADN evaluation. Also, 2FraCNN demonstrated that can solve many video challenges categories with human-like performance, such as dynamic backgrounds, jittering, shadow, bad weather, and thermal cameras, among others. Based on these findings, it can be concluded that 2FraCNN is a robust algorithm solving different video conditions with competent performance regarding state-of-the-art algorithms.