The use of automated methods for detecting and classifying different types of labels in flood images have important applications in hydrologic prediction. In this research, we propose a fully automated end-to-end image detection system to predict flood stage data using deep neural networks across two US Geological Survey (USGS) gauging stations, that is, the Columbus and the Sweetwater Creek, Georgia, USA. The images were driven from the USGS live river web cameras, which were strategically located nearby the monitoring stations and refreshed roughly every 30 s. To estimate the flood stage, a U-Net Convolutional Neural Network (U-Net CNN) was first stacked on top of a segmentation model for noise and feature reduction that diminished the number of images needed for training. A Long Short-Term Memory (LSTM), a dense model, and a CNN were then trained to predict the flood stage time series data in near real-time (6, 12, 24, and 48 hr). The results revealed that the U-Net CNN has a higher accuracy for image segmentation if the algorithm is stacked in front of the network. The absolute error with the U-Net was 0.0654 feet at the Columbus while it was 0.0035 feet at the Sweetwater Creek, which were practically low for flood stage estimation. For time series prediction, among three models, the LSTM predicted the flood stage values more accurately during both historical (2015-2022) as well as real-time forecasts, particularly for 24 and 48 hr timescales. We extensively evaluated the proposed flood stage prediction system against current state-of-the-art methodologies partly crowd-sourced and mined in real-time. Plain Language Summary In the past few years, image processing techniques are used for image labeling tasks given their capacity to learn rich features. Real-time river stage prediction is the subject of numerous studies of a similar nature. Still, they have yet to combine multiple datasets (such as time series and image data) for flood stage prediction. Here, we examined how Convolutional Neural Network, Long Short-Term Memory, and a dense model can be applied to stream live images from the US Geological Survey (USGS) web cameras and label the features for real-time flood gauge estimation. The preliminary motivation for using these models was to explore the strength of different types of representations for predicting class labels and estimating flood stages in real-time. We evaluated our models on a new image data set that was collected from multiple rivers scraped from the USGS live webcams with their associated annotated labels. Compared to other techniques, the proposed end-to-end flood stage estimation approaches produced state-of-the-art results for the two USGS stations, and also demonstrated the capability of using data intelligence tied to different sources of labeling in improving flood stage estimation.