In this paper, we propose a parallel architecture for rule-based simulation applications. This architecture yields very high speedup and exploits inherent data parallelism that is present in the application. In addition, the proposed architecture also reduces the semantic-gap that exists between an application and its implementation. We describe a top-down design methodology to implement the architecture on a commercially available multicomputer, the Intel's iPSC/2 hypercube. We implemented several hardware rule-based simulation applications with functionality on the Intel iPSC/2 hypercube. The performance measurements are provided here, which indicate that close to linear speedups are possible by using a high level parallel architecture at a knowledge representation level. The speedups measured are based on a 16 node hypercube, which is the maximum configuration available on our campus. To evaluate the linear speedup behavior of the simulation applications for large number of processors, we also built a simulation model using SIMSCRIPT, and measured close to linear speedups for several example applications. Using the simulation model, we investigated synchronization methods, partitioning strategies, and performed sensitivity analysis on several key architectural issues.