Difference between revisions of "Scan"

From CSE231 Wiki
Jump to navigation Jump to search
(Added pack)
 
(53 intermediate revisions by 4 users not shown)
Line 6: Line 6:
 
: <math>y_i = y_{i-1} + x_i</math>
 
: <math>y_i = y_{i-1} + x_i</math>
  
make it seem to have little hope for parallelism.  However, simple yet clever approaches can achieve <math>\log n</math> critical path lengths.
+
make it seem to have little hope for parallelism.  However, simple yet clever approaches can achieve <math>\log ^k n</math> critical path lengths.
  
 
While we will simply implement prefix sum, scan can be used for other associative operations.
 
While we will simply implement prefix sum, scan can be used for other associative operations.
Line 13: Line 13:
 
[https://en.wikipedia.org/wiki/Prefix_sum Prefix Sum]
 
[https://en.wikipedia.org/wiki/Prefix_sum Prefix Sum]
  
==Hillis-Steele Prefix Sum==
 
 
[https://en.wikipedia.org/wiki/Prefix_sum#Algorithm_1:_Shorter_span,_more_parallel Hillis and Steele Algorithm]
 
[https://en.wikipedia.org/wiki/Prefix_sum#Algorithm_1:_Shorter_span,_more_parallel Hillis and Steele Algorithm]
  
<youtube>RdfmxfZBHpo</youtube>
+
[https://dl.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=7903 Data parallel algorithms]
 +
 
 +
{{CollapsibleYouTube|Hillis and Steele Scan|<youtube>RdfmxfZBHpo</youtube>}}
  
 
[[File:Hillis-Steele_Prefix_Sum.svg]]
 
[[File:Hillis-Steele_Prefix_Sum.svg]]
  
==Pack==
+
=Lecture=
One of the applications for scan is pack operation. Given an input array, the operation produces an output array containing only the elements that satisfy some specified predicate.
+
[https://docs.google.com/presentation/d/1xUsx7-n6Ocvm2pQSQX3canzxqYsMo-xka5bw7qFugNk/pub slides]
 +
 
 +
<youtube>RmekgaW8X8A</youtube>
 +
 
 +
=Visualization=
 +
{{Viz|ScanViz|scan.viz|main}}
  
The problem with parallelizing pack is that although it is easy to determine whether an element should be filtered out into the output, we can't know where to put the element in the output array. It seems that placing an element into the output requires knowledge of the placement of the previous elements. 
+
[[File:Step_Efficient_Scan_Viz.png]]
  
This is where prefix sum becomes very useful. For example, if you have an integer array input:
+
=Warmup=
{| class="wikitable"
+
[[Sequential_Scan_Assignment|Sequential Sum Scan]]
|2||6||1||3||7||9||4
 
|}
 
You want to filter out all elements that are less than five. You can first create a flag array in which all the positions i where input[i] is less than 5 is flagged as "1" and all other positions are marked as "0".
 
{| class="wikitable"
 
|1||0||1||1||0||0||1
 
|}
 
The prefix sum of this flag array is:
 
{| class="wikitable"
 
|1||1||2||3||3||3||4
 
|}
 
Notice how each position that that was flagged now has a distinct number assigned to it in the prefix sum array. We can use this to help us index the output array.
 
  
==(Optional) Work-efficient Blelloch Scan==
+
=Client=
[https://en.wikipedia.org/wiki/Prefix_sum#Algorithm_2:_Work-efficient Blelloch Algorithm]
+
{{Client|StepEfficientParallelSumScannerClient|scan.client|main}}
  
<youtube>mmYv3Haj6uc</youtube>
+
{{CollapsibleCode|StepEfficientParallelSumScannerClient|
 +
<syntaxhighlight lang="java">
 +
OutOfPlaceSumScanner sumScanner = new StepEfficientParallelSumScanner();
 +
int[] data = {1, 2, 3, 4, 5, 6, 7, 8};
 +
System.out.println(Arrays.toString(data));
 +
int[] result = sumScanner.sumScan(data);
 +
System.out.println(Arrays.toString(result));
 +
</syntaxhighlight>}}
  
[[File:Prefix sum 16.svg]]
+
{{CollapsibleConsole|StepEfficientParallelSumScannerClient Output|<pre style="border: 0px; background: #000; color:#fff;">[1, 2, 3, 4, 5, 6, 7, 8]
 +
[1, 3, 6, 10, 15, 21, 28, 36]</pre>}}
  
=Code To Investigate=
+
=Code To Use=
class [https://www.cse.wustl.edu/~cosgroved/courses/cse231/s18/apidocs/scan/core/ArraysHolder.html ArraysHolder]
+
==[[PowersOf2Iterable]]==
:[https://www.cse.wustl.edu/~cosgroved/courses/cse231/s18/apidocs/scan/core/ArraysHolder.html#getSrc-- getSrc()]
+
This [[PowersOf2Iterable|exercise]] should come in handy here. Recall that we implemented a <code>PowersOfTwoLessThan</code> class in this exercise.
:[https://www.cse.wustl.edu/~cosgroved/courses/cse231/s18/apidocs/scan/core/ArraysHolder.html#getDst-- getDst()]
 
:[https://www.cse.wustl.edu/~cosgroved/courses/cse231/s18/apidocs/scan/core/ArraysHolder.html#nextSrcAndDst-- nextSrcAndDst()]
 
  
class [https://www.cse.wustl.edu/~cosgroved/courses/cse231/s18/apidocs/scan/core/PowersOfTwoLessThan.html PowersOfTwoLessThan] implements [https://docs.oracle.com/javase/8/docs/api/java/lang/Iterable.html Iterable<Integer>]
+
<syntaxhighlight lang="java">
 +
for(int v : new PowersOfTwoLessThan(71)) {
 +
    System.out.println(v);
 +
}
 +
</syntaxhighlight>
  
=Code To Implement=
+
==PhasableIntArrays==
==Sequential Scan==
+
class [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/PhasableIntArrays.html PhasableIntArrays] extends [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/AbstractPhasable.html AbstractPhasable]
{{CodeToImplement|SequentialScan|sumScan|scan.studio}}
+
: [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/AbstractPhasable.html#srcForPhase(int) srcForPhase(phaseIndex)]
 +
: [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/AbstractPhasable.html#dstForPhase(int) drcForPhase(phaseIndex)]
  
{{Sequential|public int[] sumScan(int[] data)}}
+
One of the downsides to parallel scan requires memory.  For our scans we add the additional requirement that we will not mutate the incoming array parameter.  We could create log(n) arrays, one for each level but that would be wasteful.  If we create two array buffers and switch which is the source to read from and which is the destination to write to at each power-of-two level of the scan, we should be good to go. Re-assigning two buffers back and forth as they switch between source and destination roles raises finality issues when attempting to access them inside of lambdas.
  
==Default Arrays Holder==
+
This is what <code>PhasableIntArrays</code> helps us with. By using its <code>dstForPhase(int phaseIndex)</code> and <code>srcForPhase(int phaseIndex)</code>, we can get the source and destination array corresponding to the phase.
One of the downsides to parallel scan requires memory. For our scans we add the additional requirement that we will not mutate the incoming array parameter.  We could create log(n) arrays, one for each level but that would be wasteful.  If we create two array buffers and switch which is the source to read from and which is the destination to write from at each power of two level of the scan, we should be good to go.
 
  
The table below shows which array will be passed back for each offset if <code>nextSrcAndDst</code> used appropriately from a parallel scan.
+
The table below shows which of the two buffer arrays, a or b, will be passed back for each phase index as the source or the destination array.
 
{|class="wikitable"
 
{|class="wikitable"
! ||1|||2||4||8||16
+
!'''phase''' !! src !! dst !! power of 2
 
|-
 
|-
|'''src:'''||data||a||b||a||b
+
|0||data||a||1
 
|-
 
|-
|'''dst:'''||a||b||a||b||a
+
|1||a||b||2
 +
|-
 +
|2||b||a||4
 +
|-
 +
|3||a||b||8
 +
|-
 +
|4||b||a||16
 +
|-
 +
|5||a||b||32
 +
|-
 +
|6||b||a||64
 
|}
 
|}
  
 
'''NOTE:''' think about which array is the correct one to return from your scan method given how you write your code.
 
'''NOTE:''' think about which array is the correct one to return from your scan method given how you write your code.
  
{{CodeToImplement|DefaultArraysHolder|getSrc<br>getDst<br>nextSrcAndDst<br>size|scan.studio}}
+
=Code To Implement=
 +
{{CodeToImplement|StepEfficientParallelSumScanner|sumScan|scan.exercise}}
  
==Hillis and Steele Parallel Scan==
+
{{Parallel|int[] sumScan(int[] data)}}
 +
=Challenge Problem=
 +
[[Work_Efficient_Parallel_Scan_Assignment|Blelloch (Work-efficient) Scan]]
  
{{CodeToImplement|ParallelScan|sumScan|scan.studio}}
+
=Testing Your Solution=
 
+
{{TestSuite|__ScanTestSuite|scan.exercise}}
{{Parallel|public int[] sumScan(int[] data)}}
 
 
 
==Parallel Pack==
 
 
 
{{CodeToImplement|ParallelPack|pack|pack.studio}}
 
  
{{Parallel|public static <T> T[] pack(Class<T[]> arrayType, T[] arr, Predicate<T> predicate)}}
+
=Pledge, Acknowledgments, Citations=
 
+
{{Pledge|exercise-scan}}
==(Optional) Blelloch Work Efficient Scan==
 
{{CodeToImplement|WorkEfficientScan|sumScan|scan.challenge}}
 
 
 
{{Parallel|public int[] sumScan(int[] data)}}
 
 
 
=Testing Your Solution=
 
==Correctness==
 
===Required===
 
{{TestSuite|ScanTestSuite|scan.studio}}
 
{{TestSuite|PackTestSuite|pack.studio}}
 
===Optional Work Efficient===
 
{{TestSuite|WorkEfficientScanTestSuite|scan.challenge}}
 

Latest revision as of 03:21, 18 August 2023

Motivation

Scan, also known as parallel prefix, is a fundamental and useful operation in parallel programming. We will gain experience in building Hillis & Steele scan with an optional work efficient Blellock scan.

Further, the dependencies in scan:

make it seem to have little hope for parallelism. However, simple yet clever approaches can achieve critical path lengths.

While we will simply implement prefix sum, scan can be used for other associative operations.

Background

Prefix Sum

Hillis and Steele Algorithm

Data parallel algorithms

Video: Hillis and Steele Scan  

Hillis-Steele Prefix Sum.svg

Lecture

slides

Visualization

class: ScanViz.java VIZ
package: scan.viz
source folder: student/src/main/java

Step Efficient Scan Viz.png

Warmup

Sequential Sum Scan

Client

class: StepEfficientParallelSumScannerClient.java CLIENT
package: scan.client
source folder: student/src/main/java
StepEfficientParallelSumScannerClient  
OutOfPlaceSumScanner sumScanner = new StepEfficientParallelSumScanner();
int[] data = {1, 2, 3, 4, 5, 6, 7, 8};
System.out.println(Arrays.toString(data));
int[] result = sumScanner.sumScan(data);
System.out.println(Arrays.toString(result));
StepEfficientParallelSumScannerClient Output  
[1, 2, 3, 4, 5, 6, 7, 8]
[1, 3, 6, 10, 15, 21, 28, 36]

Code To Use

PowersOf2Iterable

This exercise should come in handy here. Recall that we implemented a PowersOfTwoLessThan class in this exercise.

for(int v : new PowersOfTwoLessThan(71)) {
    System.out.println(v);
}

PhasableIntArrays

class PhasableIntArrays extends AbstractPhasable

srcForPhase(phaseIndex)
drcForPhase(phaseIndex)

One of the downsides to parallel scan requires memory. For our scans we add the additional requirement that we will not mutate the incoming array parameter. We could create log(n) arrays, one for each level but that would be wasteful. If we create two array buffers and switch which is the source to read from and which is the destination to write to at each power-of-two level of the scan, we should be good to go. Re-assigning two buffers back and forth as they switch between source and destination roles raises finality issues when attempting to access them inside of lambdas.

This is what PhasableIntArrays helps us with. By using its dstForPhase(int phaseIndex) and srcForPhase(int phaseIndex), we can get the source and destination array corresponding to the phase.

The table below shows which of the two buffer arrays, a or b, will be passed back for each phase index as the source or the destination array.

phase src dst power of 2
0 data a 1
1 a b 2
2 b a 4
3 a b 8
4 b a 16
5 a b 32
6 b a 64

NOTE: think about which array is the correct one to return from your scan method given how you write your code.

Code To Implement

class: StepEfficientParallelSumScanner.java Java.png
methods: sumScan
package: scan.exercise
source folder: student/src/main/java

method: int[] sumScan(int[] data) Parallel.svg (parallel implementation required)

Challenge Problem

Blelloch (Work-efficient) Scan

Testing Your Solution

class: __ScanTestSuite.java Junit.png
package: scan.exercise
source folder: testing/src/test/java

Pledge, Acknowledgments, Citations

file: exercise-scan-pledge-acknowledgments-citations.txt

More info about the Honor Pledge