Multi-step ahead streamflow forecasting is crucial for effective water resources planning and management in data scarce regions. This paper develops a coupled model named CNN-LSTM-Self-attention-Anticipated Learning Machine (CLS-ALM), which is based on nonlinear dynamical systems and deep learning techniques. First, the CLS-ALM model establishes a learnable parameter to effectively concatenate and fuse feature vectors by the Convolutional Neural Networks-Long Short-Term Memory (CNN-LSTM) module and the transformer module. Second, the model generates sampled nondelay attractors for high-dimensional feature vectors. ALM learns the mapping from sampled nondelay attractors of high-dimensional feature vectors to the delay attractor of the target variable. This process is referred to as the spatial-temporal information-transformation (STI) equation. This allows for the extension of target variable in the temporal dimension and the completion of predictions. Third, the model extends one-day-ahead forecasting to multi-day-ahead forecasting. For one-day-ahead prediction, at the four stations of USGS 01013500, USGS 01031500, USGS 01047000, and USGS 01030500, the R-value of CLS-ALM exceeds 0.9, especially at the USGS 01013500 station, where its R-value exceeds 0.98. When the number of training samples is 700, and the lead-time is 3 days, the NSE value of CLS-ALM is 393.88 %, 55.44%, and 181.63% higher than that of ALM, CL-S, and CL, respectively. When the lead-time is 5 days, the NSE value of CLS-ALM is 306.45 %, 304.77 %, and 1162.37 % higher than that of ALM, CL-S, and CL, respectively. At the USGS 01030500 station, when the lead-time is 7 days, the R value of CLS-ALM is 3.55 %, 25.06 %, and 6.64 % higher than that of ALM, CL-S, and CL, respectively. Therefore, the CLS-ALM can adeptly integrate and harnesses the spatiotemporal information embedded in short-term high-dimensional data, mitigating the constraints imposed by the limited sample length.