Difference between revisions of "Scan"

From CSE231 Wiki
Jump to: navigation, search
(Added pack)
(Sequential Scan)
(25 intermediate revisions by 3 users not shown)
Line 15: Line 15:
 
==Hillis-Steele Prefix Sum==
 
==Hillis-Steele Prefix Sum==
 
[https://en.wikipedia.org/wiki/Prefix_sum#Algorithm_1:_Shorter_span,_more_parallel Hillis and Steele Algorithm]
 
[https://en.wikipedia.org/wiki/Prefix_sum#Algorithm_1:_Shorter_span,_more_parallel Hillis and Steele Algorithm]
 +
 +
[https://dl.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=7903 Data parallel algorithms]
  
 
<youtube>RdfmxfZBHpo</youtube>
 
<youtube>RdfmxfZBHpo</youtube>
Line 20: Line 22:
 
[[File:Hillis-Steele_Prefix_Sum.svg]]
 
[[File:Hillis-Steele_Prefix_Sum.svg]]
  
==Pack==
+
==(Extra Credit) Work-efficient Blelloch Scan==
One of the applications for scan is pack operation. Given an input array, the operation produces an output array containing only the elements that satisfy some specified predicate.
+
[https://en.wikipedia.org/wiki/Prefix_sum#Algorithm_2:_Work-efficient Blelloch Algorithm]
  
The problem with parallelizing pack is that although it is easy to determine whether an element should be filtered out into the output, we can't know where to put the element in the output array. It seems that placing an element into the output requires knowledge of the placement of the previous elements.
+
[https://www.cs.cmu.edu/~guyb/papers/sc90.pdf Scan Primitives for Vector Computers]
 
 
This is where prefix sum becomes very useful. For example, if you have an integer array input:
 
{| class="wikitable"
 
|2||6||1||3||7||9||4
 
|}
 
You want to filter out all elements that are less than five. You can first create a flag array in which all the positions i where input[i] is less than 5 is flagged as "1" and all other positions are marked as "0".
 
{| class="wikitable"
 
|1||0||1||1||0||0||1
 
|}
 
The prefix sum of this flag array is:
 
{| class="wikitable"
 
|1||1||2||3||3||3||4
 
|}
 
Notice how each position that that was flagged now has a distinct number assigned to it in the prefix sum array. We can use this to help us index the output array.
 
  
==(Optional) Work-efficient Blelloch Scan==
+
[https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf Prefix Sums and Their Applications]
[https://en.wikipedia.org/wiki/Prefix_sum#Algorithm_2:_Work-efficient Blelloch Algorithm]
 
  
 
<youtube>mmYv3Haj6uc</youtube>
 
<youtube>mmYv3Haj6uc</youtube>
Line 46: Line 33:
 
[[File:Prefix sum 16.svg]]
 
[[File:Prefix sum 16.svg]]
  
=Code To Investigate=
+
=Lecture=
class [https://www.cse.wustl.edu/~cosgroved/courses/cse231/s18/apidocs/scan/core/ArraysHolder.html ArraysHolder]
+
[https://docs.google.com/presentation/d/1xUsx7-n6Ocvm2pQSQX3canzxqYsMo-xka5bw7qFugNk/pub slides]
:[https://www.cse.wustl.edu/~cosgroved/courses/cse231/s18/apidocs/scan/core/ArraysHolder.html#getSrc-- getSrc()]
 
:[https://www.cse.wustl.edu/~cosgroved/courses/cse231/s18/apidocs/scan/core/ArraysHolder.html#getDst-- getDst()]
 
:[https://www.cse.wustl.edu/~cosgroved/courses/cse231/s18/apidocs/scan/core/ArraysHolder.html#nextSrcAndDst-- nextSrcAndDst()]
 
  
class [https://www.cse.wustl.edu/~cosgroved/courses/cse231/s18/apidocs/scan/core/PowersOfTwoLessThan.html PowersOfTwoLessThan] implements [https://docs.oracle.com/javase/8/docs/api/java/lang/Iterable.html Iterable<Integer>]
+
<youtube>RmekgaW8X8A</youtube>
  
=Code To Implement=
+
=Code To Use=
==Sequential Scan==
+
class [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/PhasableIntArrays.html PhasableIntArrays] extends [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/AbstractPhasable.html AbstractPhasable]
{{CodeToImplement|SequentialScan|sumScan|scan.studio}}
+
: [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/AbstractPhasable.html#getSrcForPhase(int) getSrcForPhase(phaseIndex)]
 +
: [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/phasable/AbstractPhasable.html#getSrcForPhase(int) getSrcForPhase(phaseIndex)]
  
{{Sequential|public int[] sumScan(int[] data)}}
+
[[PowersOf2Iterable]]
  
==Default Arrays Holder==
+
==PhasableIntArrays==
 
One of the downsides to parallel scan requires memory.  For our scans we add the additional requirement that we will not mutate the incoming array parameter.  We could create log(n) arrays, one for each level but that would be wasteful.  If we create two array buffers and switch which is the source to read from and which is the destination to write from at each power of two level of the scan, we should be good to go.
 
One of the downsides to parallel scan requires memory.  For our scans we add the additional requirement that we will not mutate the incoming array parameter.  We could create log(n) arrays, one for each level but that would be wasteful.  If we create two array buffers and switch which is the source to read from and which is the destination to write from at each power of two level of the scan, we should be good to go.
  
The table below shows which array will be passed back for each offset if <code>nextSrcAndDst</code> used appropriately from a parallel scan.
+
The table below shows which array will be passed back for each offset.
 
{|class="wikitable"
 
{|class="wikitable"
! ||1|||2||4||8||16
+
|'''phase:''' ||0|||1||2||3||4
 +
|-
 +
|'''power of 2:''' ||1|||2||4||8||16
 
|-
 
|-
 
|'''src:'''||data||a||b||a||b
 
|'''src:'''||data||a||b||a||b
Line 74: Line 61:
 
'''NOTE:''' think about which array is the correct one to return from your scan method given how you write your code.
 
'''NOTE:''' think about which array is the correct one to return from your scan method given how you write your code.
  
{{CodeToImplement|DefaultArraysHolder|getSrc<br>getDst<br>nextSrcAndDst<br>size|scan.studio}}
+
==[[PowersOf2Iterable]]==
 +
This [[PowersOf2Iterable|studio]] should come in handy here.
  
==Hillis and Steele Parallel Scan==
+
Example usage:
 +
<nowiki>for(int v : new PowersOfTwoLessThan(71)) {
 +
    System.out.println(v);
 +
}</nowiki>
 +
 
 +
=Code To Implement=
 +
==Sequential Scan==
 +
{{CodeToImplement|SequentialScan|sumScanInclusive|scan.studio}}
 +
 
 +
{{Sequential|public static int[] sumScanInclusive(int[] data)}}
 +
 
 +
{{Warning|Do NOT mutate the data parameter.  Return a new array which contains the sum scan of data.}}
 +
 
 +
Implement the simple sequential scan algorithm with the dependencies depicted below:
 +
 
 +
[[File:Sequential_scan.png|600px]]
  
{{CodeToImplement|ParallelScan|sumScan|scan.studio}}
+
Work: N
  
{{Parallel|public int[] sumScan(int[] data)}}
+
CPL: N
  
==Parallel Pack==
+
==Hillis and Steele Parallel Scan==
  
{{CodeToImplement|ParallelPack|pack|pack.studio}}
+
{{CodeToImplement|StepEfficientParallelScan|sumScanInclusive|scan.studio}}
  
{{Parallel|public static <T> T[] pack(Class<T[]> arrayType, T[] arr, Predicate<T> predicate)}}
+
{{Parallel|private static int[] sumScanInclusive(PhasableIntArrays phasable)}}
  
==(Optional) Blelloch Work Efficient Scan==
+
==(Extra Credit) Blelloch Work Efficient Scan==
{{CodeToImplement|WorkEfficientScan|sumScan|scan.challenge}}
+
{{CodeToImplement|WorkEfficientParallelScan|sumScanExclusiveInPlace|scan.challenge}}
  
{{Parallel|public int[] sumScan(int[] data)}}
+
{{Parallel|public static void sumScanExclusiveInPlace(int[] data)}}
  
 
=Testing Your Solution=
 
=Testing Your Solution=
Line 97: Line 100:
 
===Required===
 
===Required===
 
{{TestSuite|ScanTestSuite|scan.studio}}
 
{{TestSuite|ScanTestSuite|scan.studio}}
{{TestSuite|PackTestSuite|pack.studio}}
 
 
===Optional Work Efficient===
 
===Optional Work Efficient===
 
{{TestSuite|WorkEfficientScanTestSuite|scan.challenge}}
 
{{TestSuite|WorkEfficientScanTestSuite|scan.challenge}}

Revision as of 01:29, 7 April 2020

Motivation

Scan, also known as parallel prefix, is a fundamental and useful operation in parallel programming. We will gain experience in building Hillis & Steele scan with an optional work efficient Blellock scan.

Further, the dependencies in scan:

y_i = y_{i-1} + x_i

make it seem to have little hope for parallelism. However, simple yet clever approaches can achieve \log n critical path lengths.

While we will simply implement prefix sum, scan can be used for other associative operations.

Background

Prefix Sum

Hillis-Steele Prefix Sum

Hillis and Steele Algorithm

Data parallel algorithms

Hillis-Steele Prefix Sum.svg

(Extra Credit) Work-efficient Blelloch Scan

Blelloch Algorithm

Scan Primitives for Vector Computers

Prefix Sums and Their Applications

Prefix sum 16.svg

Lecture

slides

Code To Use

class PhasableIntArrays extends AbstractPhasable

getSrcForPhase(phaseIndex)
getSrcForPhase(phaseIndex)

PowersOf2Iterable

PhasableIntArrays

One of the downsides to parallel scan requires memory. For our scans we add the additional requirement that we will not mutate the incoming array parameter. We could create log(n) arrays, one for each level but that would be wasteful. If we create two array buffers and switch which is the source to read from and which is the destination to write from at each power of two level of the scan, we should be good to go.

The table below shows which array will be passed back for each offset.

phase: 0 1 2 3 4
power of 2: 1 2 4 8 16
src: data a b a b
dst: a b a b a

NOTE: think about which array is the correct one to return from your scan method given how you write your code.

PowersOf2Iterable

This studio should come in handy here.

Example usage:

for(int v : new PowersOfTwoLessThan(71)) {
    System.out.println(v);
}

Code To Implement

Sequential Scan

class: SequentialScan.java Java.png
methods: sumScanInclusive
package: scan.studio
source folder: src/main/java

method: public static int[] sumScanInclusive(int[] data) Sequential.svg (sequential implementation only)

Attention niels epting.svg Warning:Do NOT mutate the data parameter. Return a new array which contains the sum scan of data.

Implement the simple sequential scan algorithm with the dependencies depicted below:

Sequential scan.png

Work: N

CPL: N

Hillis and Steele Parallel Scan

class: StepEfficientParallelScan.java Java.png
methods: sumScanInclusive
package: scan.studio
source folder: src/main/java

method: private static int[] sumScanInclusive(PhasableIntArrays phasable) Parallel.svg (parallel implementation required)

(Extra Credit) Blelloch Work Efficient Scan

class: WorkEfficientParallelScan.java Java.png
methods: sumScanExclusiveInPlace
package: scan.challenge
source folder: src/main/java

method: public static void sumScanExclusiveInPlace(int[] data) Parallel.svg (parallel implementation required)

Testing Your Solution

Correctness

Required

class: ScanTestSuite.java Junit.png
package: scan.studio
source folder: src/test/java

Optional Work Efficient

class: WorkEfficientScanTestSuite.java Junit.png
package: scan.challenge
source folder: src/test/java