The EM algorithm is one of the most suitable iterative methods for PET image reconstruction; however, it requires a long computation time and an enormous amount of memory space. To overcome these two problems, in this paper, we present two classes of highly efficient parallelization schemes, namely, homogeneous and inhomogenous partitionings. The essential difference of these two classes is that the inhomogeneous partitioning schemes may partially overlap the communication with computation by deliberate exploitation of the inherent data access pattern with a multiple-ring communication pattern. In theory, the inhomogeneous partitioning schemes may outperform the homogeneous partitioning schemes. However, the latter requires a simpler communication pattern. In the attempt to estimate the achievable performance and analyze the performance degradation factors without actual implementations, we have derived the efficiency prediction formulas closely estimating the performance for the proposed parallelization schemes. We propose new integration and broadcasting algorithms for hypercube, ring, and n-D mesh topologies, which are more efficient than the conventional algorithms when the link setup time is relatively negligible. We believe that the concept of the proposed task and data partitioning schemes, the integration and broadcasting algorithms, and the efficiency estimation methods can be applied to many other problems that are rich in data parallelism, but without a balanced exclusive partitioning.