Deep stochastic configuration network (DSCN) is an randomized incremental learning model, it can start from a small structure, increase the nodes and hidden layers gradually. As the input weights and biases of nodes are assigned according to supervisory mechanism, meantime, all the nodes in hidden layer are fully connected to the outputs, the output weights of DSCN are determined through the least square method. Therefore, DSCN has the advantages of less manual intervention, high learning efficiency, strong generalization ability. However, although the randomized feedforward learning process of DSCN has faster efficiency, the feature learning ability is still insufficient. In the meantime, with the increase of nodes and hidden layers, it is easy to lead to overfitting phenomenon. When solving regression problems with noise, the performance of original DSCN is easily affected by outliers, which reduces the generalization ability of the model. Therefore, to improve the regression performance and robustness of DSCN, weighted deep stochastic configuration networks (WDSCN) based on M-Estimator functions are proposed. First of all, we adopt two common M-estimator functions (i.e., Huber and Bisquare) to acquire the sample weights for reducing the negative impact of outliers. When the sample has a smaller training error, give this sample a larger weight, while when the training error of sample is larger, it is determined to be outlier data and give this sample a smaller weight. The sample weight decreases monotonically with the increase of the absolute value of the error, thus reducing the influence of noisy data onto the model and improving the generalization of the algorithm. Meanwhile, the weighted least square method and L2 regularization strategy are introduced to calculate output weight vector replace the least square method. It can not only solve the noisy data regression problems and avoid over-fitting problem of DSCN. In the second place, the model based on L1 regularization is helpful to extract sparse features and improve the accuracy of supervised learning, for further improve the representation ability of WDSCN, a stochastic configuration sparse autoencoder (SC-SAE) is designed, SC-SAE use the supervision mechanism of DSCN to assign input parameters, at the same time, we adopt the L1 regularization technique to objective function for getting sparse features, alternating direction method of multipliers (ADMM) approach is utilized to solve the objective function for determining the output weights of SC-SAE. And then, as the randomness encoding process of SC-SAE, we can obtain the diversity of features of different SC-SAE models, consequently effective feature representation can be acquired through fusion features from multiple SC-SAE for the training of WDSCN. Finally, experimental results on real-world datasets show that the proposed WDSCN-Huber and WDSCN-Bisquare have higher generalization performances and regression accuracies than DSCN, SCN, and other weighted models (e.g., RSC-KDE, RSC-Huber, RSC-IQR, RDSCN-KDE, WBLS-KDE and RBLS-Huber). But in the meantime, the results of ablation experiment show that WDSCN with fusion sparse features which exacted from multiple different SC-SAE models are superior to those models with fusion sparse feature. Therefore, it is verified that SC-SAE can extract effective sparse features and improve the learning ability of weighted models. © 2023 Science Press. All rights reserved.