Scan, also known as parallel prefix, is a fundamental and useful operation in parallel programming. We will gain experience in building Hillis & Steele scan with an optional work efficient Blellock scan.
Further, the dependencies in scan:
make it seem to have little hope for parallelism. However, simple yet clever approaches can achieve critical path lengths.
While we will simply implement prefix sum, scan can be used for other associative operations.
Code To Investigate
Code To Implement
Hillis and Steele Parallel Scan
One of the downsides to parallel scan requires memory. For our scans we add the additional requirement that we will not mutate the incoming array parameter. We could create log(n) arrays, one for each level but that would be wasteful. If we create two array buffers and switch which is the source to read from and which is the destination to write from at each power of two level of the scan, we should be good to go.
The table below shows which array will be passed back for each offset if
nextSrcAndDst used appropriately from a parallel scan.
(Optional) Blelloch Work Efficient Scan
Testing Your Solution
Optional Work Efficient