The fast-flowing watershed is characterized by rapid runoff and confluence, posing challenges for accurate flood prediction. We introduce three flood forecasting model structures, namely GRU-ED, LSTM-FED, and LSTM-DSA to address this issue. Through application research in three representative watersheds, we found that: First, as input information attenuates, the predictive ability of the models may decline with an extended lead time. The incorporation of a feedback mechanism effectively addresses this issue, resulting in an average 5% improvement in Nash efficiency and a significant 26.4% reduction in the interquartile range of relative peak error. Second, the performance of the model is influenced by various factors, including the watershed characteristics, sample size, and temporal resolution. Further investigation is required to determine the extent of their influence. The attention mechanism dynamically assigns weights to input data, significantly improving model performance, especially for larger catchments. This leads to an average increase in Nash efficiency of approximately 7.86% and a reduction in the interquartile range of relative peak error by about 30.7%. Finally, the proposed models demonstrate a high level of accuracy in flood forecasting within a specific lead time, offering an innovative deep learning-based solution to the problem of fast-flowing watershed flood forecasting.