Federated Learning (FL) enables multiple clients to collaboratively train a shared server model while preserving data privacy. Most existing FL systems rely on the assumption that the server model and client models have homogeneous architecture. However, intensive resource requirements during the training process prevent low-end devices from contributing to the server model with their own data. On the other hand, the resource constraints on participating clients can significantly limit the size of the server model in the model-homogeneous setting, thereby restricting the application scope of FL. In this work, we propose FedEKT, a novel model-heterogeneous FL system designed to obtain a high-performance large server model while benefiting heterogeneous small client models. Specifically, a new aggregation approach is designed to enable the integration of knowledge from heterogeneous client models to a large server model while mitigating the adverse effects of biases stemming from data heterogeneity. Subsequently, to enhance the performance of client models by benefiting from the high-performance server model, FedEKT distills this large server model into multiple heterogeneous client models, facilitating the transfer of integrated knowledge back to the client models. In addition, we design specialized modules within the model and communication strategy to accomplish aggregation and transfer of knowledge in a data-free manner. The evaluation results demonstrate that FedEKT enhances the accuracy of the server model and client models by up to 53.96% and 12.35%, respectively, compared with the state-of-the-art FL approach on CIFAR-100.