This paper presents an efficient parallel hardware algorithm far the prefix computation Since the proposed scheme is based on dataflow, it does not require any preprocessing time or memory to store the data to accomplish the task, and it is suitable for the VLSI implementation. A linear systolic array architecture with simple basic cells is presented. To control the degree of the parallelism, the design uses multiple sub-streams for input and output. The design receives multiple input streams of elements in parallel, and produces output streams in parallel. Since the degree of the parallelism is controllable, the design has a great advantage when we consider the resource constraints of the system. The time complexity of the design is O(d + (N-d) / d) where d and N are the parallelism degree and the stream size respectively. When the stream size is very big, the initial trigger time d in the time complexity can be ignored and we get O(N/d). In the case with enough resources, the optimal degree of the parallelism is found at N-1/2. The proposed design is able to work on infinite length input elements.