Difference between revisions of "Scan"
(→Client) |
|||
(48 intermediate revisions by 2 users not shown) | |||
Line 6: | Line 6: | ||
: <math>y_i = y_{i-1} + x_i</math> | : <math>y_i = y_{i-1} + x_i</math> | ||
− | make it seem to have little hope for parallelism. However, simple yet clever approaches can achieve <math>\log n</math> critical path lengths. | + | make it seem to have little hope for parallelism. However, simple yet clever approaches can achieve <math>\log ^k n</math> critical path lengths. |
While we will simply implement prefix sum, scan can be used for other associative operations. | While we will simply implement prefix sum, scan can be used for other associative operations. | ||
Line 13: | Line 13: | ||
[https://en.wikipedia.org/wiki/Prefix_sum Prefix Sum] | [https://en.wikipedia.org/wiki/Prefix_sum Prefix Sum] | ||
− | |||
[https://en.wikipedia.org/wiki/Prefix_sum#Algorithm_1:_Shorter_span,_more_parallel Hillis and Steele Algorithm] | [https://en.wikipedia.org/wiki/Prefix_sum#Algorithm_1:_Shorter_span,_more_parallel Hillis and Steele Algorithm] | ||
[https://dl.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=7903 Data parallel algorithms] | [https://dl.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=7903 Data parallel algorithms] | ||
− | <youtube>RdfmxfZBHpo</youtube> | + | {{CollapsibleYouTube|Hillis and Steele Scan|<youtube>RdfmxfZBHpo</youtube>}} |
[[File:Hillis-Steele_Prefix_Sum.svg]] | [[File:Hillis-Steele_Prefix_Sum.svg]] | ||
+ | =Lecture= | ||
+ | [https://docs.google.com/presentation/d/1xUsx7-n6Ocvm2pQSQX3canzxqYsMo-xka5bw7qFugNk/pub slides] | ||
− | + | <youtube>RmekgaW8X8A</youtube> | |
− | |||
− | + | =Visualization= | |
+ | {{Viz|ScanViz|scan.viz|main}} | ||
− | [ | + | [[File:Step_Efficient_Scan_Viz.png]] |
− | + | =Warmup= | |
+ | [[Sequential_Scan_Assignment|Sequential Sum Scan]] | ||
− | + | =Client= | |
+ | {{Client|StepEfficientParallelSumScannerClient|scan.client|main}} | ||
− | = | + | {{CollapsibleCode|StepEfficientParallelSumScannerClient| |
− | + | <syntaxhighlight lang="java"> | |
− | + | OutOfPlaceSumScanner sumScanner = new StepEfficientParallelSumScanner(); | |
− | + | int[] data = {1, 2, 3, 4, 5, 6, 7, 8}; | |
− | + | System.out.println(Arrays.toString(data)); | |
+ | int[] result = sumScanner.sumScan(data); | ||
+ | System.out.println(Arrays.toString(result)); | ||
+ | </syntaxhighlight>}} | ||
− | + | {{CollapsibleConsole|StepEfficientParallelSumScannerClient Output|<pre style="border: 0px; background: #000; color:#fff;">[1, 2, 3, 4, 5, 6, 7, 8] | |
+ | [1, 3, 6, 10, 15, 21, 28, 36]</pre>}} | ||
− | = | + | =Code To Use= |
− | + | ==[[PowersOf2Iterable]]== | |
− | + | This [[PowersOf2Iterable|exercise]] should come in handy here. Recall that we implemented a <code>PowersOfTwoLessThan</code> class in this exercise. | |
− | |||
− | |||
− | = | + | <syntaxhighlight lang="java"> |
− | == | + | for(int v : new PowersOfTwoLessThan(71)) { |
− | + | System.out.println(v); | |
+ | } | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | ==PhasableIntArrays== | ||
+ | class [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/PhasableIntArrays.html PhasableIntArrays] extends [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/AbstractPhasable.html AbstractPhasable] | ||
+ | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/AbstractPhasable.html#srcForPhase(int) srcForPhase(phaseIndex)] | ||
+ | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/AbstractPhasable.html#dstForPhase(int) drcForPhase(phaseIndex)] | ||
− | + | One of the downsides to parallel scan requires memory. For our scans we add the additional requirement that we will not mutate the incoming array parameter. We could create log(n) arrays, one for each level but that would be wasteful. If we create two array buffers and switch which is the source to read from and which is the destination to write to at each power-of-two level of the scan, we should be good to go. Re-assigning two buffers back and forth as they switch between source and destination roles raises finality issues when attempting to access them inside of lambdas. | |
− | + | This is what <code>PhasableIntArrays</code> helps us with. By using its <code>dstForPhase(int phaseIndex)</code> and <code>srcForPhase(int phaseIndex)</code>, we can get the source and destination array corresponding to the phase. | |
− | |||
− | The table below shows which | + | The table below shows which of the two buffer arrays, a or b, will be passed back for each phase index as the source or the destination array. |
{|class="wikitable" | {|class="wikitable" | ||
− | ! || | + | !'''phase''' !! src !! dst !! power of 2 |
+ | |- | ||
+ | |0||data||a||1 | ||
|- | |- | ||
− | | | + | |1||a||b||2 |
|- | |- | ||
− | | | + | |2||b||a||4 |
+ | |- | ||
+ | |3||a||b||8 | ||
+ | |- | ||
+ | |4||b||a||16 | ||
+ | |- | ||
+ | |5||a||b||32 | ||
+ | |- | ||
+ | |6||b||a||64 | ||
|} | |} | ||
'''NOTE:''' think about which array is the correct one to return from your scan method given how you write your code. | '''NOTE:''' think about which array is the correct one to return from your scan method given how you write your code. | ||
− | {{CodeToImplement| | + | =Code To Implement= |
+ | {{CodeToImplement|StepEfficientParallelSumScanner|sumScan|scan.exercise}} | ||
− | == | + | {{Parallel|int[] sumScan(int[] data)}} |
+ | =Challenge Problem= | ||
+ | [[Work_Efficient_Parallel_Scan_Assignment|Blelloch (Work-efficient) Scan]] | ||
− | + | =Testing Your Solution= | |
− | + | {{TestSuite|__ScanTestSuite|scan.exercise}} | |
− | |||
− | |||
− | |||
− | {{ | ||
− | + | =Pledge, Acknowledgments, Citations= | |
− | + | {{Pledge|exercise-scan}} | |
− | |||
− | |||
− | |||
− | {{ | ||
− | |||
− |
Latest revision as of 03:21, 18 August 2023
Contents
Motivation
Scan, also known as parallel prefix, is a fundamental and useful operation in parallel programming. We will gain experience in building Hillis & Steele scan with an optional work efficient Blellock scan.
Further, the dependencies in scan:
make it seem to have little hope for parallelism. However, simple yet clever approaches can achieve critical path lengths.
While we will simply implement prefix sum, scan can be used for other associative operations.
Background
Video: Hillis and Steele Scan |
---|
Lecture
Visualization
class: | ScanViz.java | VIZ |
package: | scan.viz | |
source folder: | student/src/main/java |
Warmup
Client
class: | StepEfficientParallelSumScannerClient.java | CLIENT |
package: | scan.client | |
source folder: | student/src/main/java |
StepEfficientParallelSumScannerClient |
---|
OutOfPlaceSumScanner sumScanner = new StepEfficientParallelSumScanner();
int[] data = {1, 2, 3, 4, 5, 6, 7, 8};
System.out.println(Arrays.toString(data));
int[] result = sumScanner.sumScan(data);
System.out.println(Arrays.toString(result));
|
StepEfficientParallelSumScannerClient Output |
---|
[1, 2, 3, 4, 5, 6, 7, 8] [1, 3, 6, 10, 15, 21, 28, 36] |
Code To Use
PowersOf2Iterable
This exercise should come in handy here. Recall that we implemented a PowersOfTwoLessThan
class in this exercise.
for(int v : new PowersOfTwoLessThan(71)) {
System.out.println(v);
}
PhasableIntArrays
class PhasableIntArrays extends AbstractPhasable
One of the downsides to parallel scan requires memory. For our scans we add the additional requirement that we will not mutate the incoming array parameter. We could create log(n) arrays, one for each level but that would be wasteful. If we create two array buffers and switch which is the source to read from and which is the destination to write to at each power-of-two level of the scan, we should be good to go. Re-assigning two buffers back and forth as they switch between source and destination roles raises finality issues when attempting to access them inside of lambdas.
This is what PhasableIntArrays
helps us with. By using its dstForPhase(int phaseIndex)
and srcForPhase(int phaseIndex)
, we can get the source and destination array corresponding to the phase.
The table below shows which of the two buffer arrays, a or b, will be passed back for each phase index as the source or the destination array.
phase | src | dst | power of 2 |
---|---|---|---|
0 | data | a | 1 |
1 | a | b | 2 |
2 | b | a | 4 |
3 | a | b | 8 |
4 | b | a | 16 |
5 | a | b | 32 |
6 | b | a | 64 |
NOTE: think about which array is the correct one to return from your scan method given how you write your code.
Code To Implement
class: | StepEfficientParallelSumScanner.java | |
methods: | sumScan | |
package: | scan.exercise | |
source folder: | student/src/main/java |
method: int[] sumScan(int[] data)
(parallel implementation required)
Challenge Problem
Blelloch (Work-efficient) Scan
Testing Your Solution
class: | __ScanTestSuite.java | |
package: | scan.exercise | |
source folder: | testing/src/test/java |
Pledge, Acknowledgments, Citations
file: | exercise-scan-pledge-acknowledgments-citations.txt |
More info about the Honor Pledge