Cluster scheduling, where processors are grouped into clusters and the tasks that are allocated to one cluster are scheduled by a global scheduler, has attracted attention in multiprocessor real-time systems research recently. In this paper, by adopting optimal global schedulers within each cluster, first we investigate the worstcase utilization bound for cluster scheduling. Specifically, for a system with m homogeneous clusters where each cluster has k processors, we show that the worstcase achievable system utilization is left perpendicular k/alpha right perpendicular . m+1/left perpendicular k/alpha right perpendicular + 1 . k, where a is the maximum utilization for the periodic tasks considered. By focusing on an efficient optimal global scheduler, namely the boundary-fair (Bfair) algorithm, we propose a period-aware partitioning heuristic aiming at reducing the scheduling overhead. Simulation results show that the percentage of task sets that can be scheduled is significantly improved under cluster scheduling even for small-size clusters (e.g., k = 2). Moreover, the proposed period-aware partitioning heuristic markedly reduces the scheduling overhead of cluster scheduling with Bfair.