Difference between revisions of "Understanding Fork Join"
(Created page with "== Overview == This page will walk you through reading documentation and understanding what the functions do in a practical manner. == Fork == <p>When we look at the docs, we...") |
|||
(7 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Overview == | == Overview == | ||
− | This page will walk you through reading documentation and understanding what the functions do in a practical manner. | + | This page will walk you through reading documentation and understanding what the functions do in a practical manner. There is an assumption that you have attended class and watched the lectures, so the terms are familiar, but there is trouble knowing what everything means/does. |
== Fork == | == Fork == | ||
+ | === Reading the Documentation === | ||
<p>When we look at the docs, we are greeted with the following:</p> | <p>When we look at the docs, we are greeted with the following:</p> | ||
− | [ | + | |
− | <p><code>static</code> means that the function is related to the class, <code>FJ</code> as opposed to a specific instance of that class. This means that we can call <code>FJ.fork</code> directly, we don't need to create an instance of that class to do so. In our code, we typically just write <code>fork</code> because Professor Cosgrove has imported the appropriate dependencies for you!</p> | + | <p><code>public static <T> [https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/concurrent/Future.html Future<T>] [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/fj/FJ.html#fork(fj.api.TaskSupplier) fork]([https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/fj/FJ.html#fork(fj.api.TaskSupplier) TaskSupplier<T>] task)</code></p> |
+ | |||
+ | <p>What does this mean? Let's break it down</p> | ||
+ | |||
+ | ==== Static ==== | ||
+ | <p><code>static</code> means that the function is related to the class, <code>FJ</code> as opposed to a specific instance of that class. This means that we can call <code>FJ.fork</code> directly, we don't need to create an instance of that class to do so. In our code, we typically just write <code>fork</code> because Professor Cosgrove has imported the appropriate dependencies for you! In fact, this entire library is static, so you should just be calling these functions by their function name.</p> | ||
+ | |||
+ | ==== Lambda Functions ==== | ||
+ | <p>We see that <code>fork</code> takes in one parameter of type <code>TaskSupplier<T></code>. If we follow the link, we see that a <code>TaskSupplier<T></code> is just a single function, which has a return type <code>T</code> and can throw <code>InterruptedException</code> and <code>ExecutionException</code>.</p> | ||
+ | |||
+ | <p><code>T [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/fj/api/TaskSupplier.html#get() get()] throws [https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/lang/InterruptedException.html InterruptedException],[https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/concurrent/ExecutionException.html ExecutionException]</code></p> | ||
+ | |||
+ | <p>Thus, any function that takes in no parameters and returns anything qualifies as a <code>TaskSupplier<T></code>. In code for this course, you will often see code such as:</p> | ||
+ | {{CollapsibleCode|Lambda function |<pre>public int count(/* parameters */) { | ||
+ | Future<Integer> firstHalfFuture = fork(() -> { | ||
+ | /* Parallel task */ | ||
+ | }); | ||
+ | /* Main task */ | ||
+ | int firstHalf = join(firstHalfFuture); | ||
+ | /* more code */ | ||
+ | }</pre>}} | ||
+ | <p>This is the same as:</p> | ||
+ | {{CollapsibleCode|Explicit function |<pre>public int myFunc() { | ||
+ | /* Parallel task */ | ||
+ | } | ||
+ | |||
+ | public int count(/* parameters */) { | ||
+ | Future<Integer> firstHalfFuture = fork(() -> myFunc()); | ||
+ | /* Main task */ | ||
+ | int firstHalf = join(firstHalfFuture); | ||
+ | /* more code */ | ||
+ | }</pre>}} | ||
+ | <p>In the first case, we defined the function inline at the <code>fork</code> method call, while the second case is more akin to what you may be used to.</p> | ||
+ | |||
+ | ==== Generics ==== | ||
+ | <p>You may have the question, what is a value of type <code>T</code>? <code><T></code> means that we have a generic type called <code>T</code>. <code>T</code> isn't any one type in particular, but instead is just a guarantee that in the following code, whenever we see <code>T</code>, it is referring to the same type. An example you may be more familiar with is the <code>ArrayList<E></code> class. If we look at the [https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html documentation] for <code>ArrayList<E></code>, we see, among other things:</p> | ||
+ | |||
+ | <p><code>public class [https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html ArrayList<E>]</code></p> | ||
+ | <p><code>public boolean [https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html#add-E- add](E e)</code></p> | ||
+ | <p><code>public E [https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html#get-int- get](int index)</code></p> | ||
+ | |||
+ | <p>In the case of <code>ArrayList<E></code>, <code>E</code> is the name of the generic type. And thus, everywhere we see <code>E</code>, it is the same type <code>E</code> as was supplied to the class constructor. That is why when you create an <code>ArrayList<Integer></code>, you are able to use it for <code>Integer</code>, but not some other type such as <code>String</code>. Then, when you create an <code>ArrayList<String></code>, you are able to use it for <code>String</code>, but not other types such as <code>Integer</code>.</p> | ||
+ | <p>Coming back to the <code>fork</code> function, we next see that the return type is <code>Future<T></code>. We can follow the link to the java documentation for <code>Future</code> to see that when we call <code>Future.get()</code>, it will return a value of type <code>T</code>. Generally, in this course, we use <code>T</code> when referring to the type of the data that is passed into the parallel task, and <code>R</code> when referring to the type of the data that is returned by the parallel task.</p> | ||
+ | |||
+ | ==== Practical Explanation ==== | ||
+ | <p>So putting all of this together, the <code>fork</code> function is a <code>static</code> function that is tied to the <code>FJ</code> class, and not any specific instance of that class. It takes in a function that returns any type <code>T</code>, and itself returns a <code>Future<T></code> object, which will return a value of that same type <code>T</code> when the <code>get</code> function is called on it. The function that is passed into the <code>fork</code> function is run in parallel to the main process. Thus, code in the following structure leads to a parallel program:</p> | ||
+ | {{CollapsibleCode|Fork template |<pre>// Parallel task and Main task will run in parallel to one another | ||
+ | Future<SomeType> myFuture = fork(() -> { | ||
+ | /* Parallel task */ | ||
+ | }); | ||
+ | /* Main task */ | ||
+ | SomeType myValue = join(myFuture); | ||
+ | }</pre>}} | ||
+ | |||
+ | == Join == | ||
+ | <p>The various <code>join</code> functions gets the value of the <code>Future</code> or multiple <code>Future</code> objects passed in and returns the value(s) in a convenient manner. You can pass in any number of <code>Future<R></code> objects, collections of <code>Future<R></code>, arrays of <code>Future<R></code> and in the cases that you passed in multiple values, join can return either a list or an array, so most of the heavy-lifting should be taken care of by the compiler. Here are some general reminders/pointers regarding the <code>join</code> method. All <code>join</code> methods are a blocking methods, so if the value isn't ready, the program just sits there. <insert this is fine dog> Also, all <code>join</code> methods may throw <code>InterruptedException</code> or <code>ExecutionException</code>. You may notice that these are the same errors as are thrown in the <code>get</code> method of <code>Future<R></code>...</p> | ||
+ | |||
+ | == Fork Loop == | ||
+ | <p>1 down, 26 to go! Don't worry, things will get faster now as there is less to explain.</p> | ||
+ | |||
+ | === Fork-Looping with Indices === | ||
+ | <p>The first <code>fork_loop</code> we are going to explore has the following signature:</p> | ||
+ | |||
+ | <code>public static <R> [https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/concurrent/Future.html Future]<R>[] [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/fj/api/TaskIntFunction.html fork_loop](int min, int maxExclusive, [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/fj/api/TaskIntFunction.html TaskIntFunction]<R> function)</code> | ||
+ | |||
+ | <p>This version of <code>fork_loop</code> as a whole returns a, <code>Future<R>[]</code>, an array of <code>Future<R></code>. It takes in two integers, <code>min</code> and <code>maxExclusive</code>, then a <code>TaskIntFunction<R></code>. Following the link for the <code>TaskIntFunction<R></code> reveals that it is a function that has one parameter of type <code>int</code> and returns a value of type <code>R</code>. Under the hood, the <code>fork_loop</code> function creates one fork, one parallel branch, for each integer in the range <code>[min, maxExclusive)</code>, and passes to each branch its corresponding integer. Then, when the values return from the branches, the <code>join_fork</code> loop bundles them back up in an array, such that the return value of the <code>min</code>th branch is the 0th element in the returned array, and the return value of the <code>maxExclusive</code>th branch is the last element of the returned array. Isn't that handy? This <code>join_fork</code> function is very similar to a "standard" for loop <code>for(int i = min; i < maxExclusive; i++)</code>, just parallel. An example use case of this <code>fork_loop</code> is as follows:</p> | ||
+ | {{CollapsibleCode|join_fork example |<pre>public int[] parallel_double(int[] array) { | ||
+ | Future<Integer>[] updatedFuture = join_fork(0, array.length, (index) -> { | ||
+ | return array[index] * 2; | ||
+ | }); | ||
+ | |||
+ | return join(updatedFuture); | ||
+ | }</pre>}} | ||
+ | <p>So <code>parallel_double(input)[0] = input[0] * 2</code>, <code>parallel_double(input)[1] = input[1] * 2</code>, <code>parallel_double(input)[2] = input[2] * 2</code>, etc. Notice that we did not have to declare the type of <code>index</code>. This is because the definition of <code>join_fork</code> defines the third parameter, the lambda function, as a <code>TaskIntFunction<R></code>, which we already know takes in a value of type <code>int</code>.</p> | ||
+ | |||
+ | === Fork-Looping with Iterables and Arrays === | ||
+ | <p>We will now explore the following versions of <code>fork_loop</code>:</p> | ||
+ | |||
+ | <code>public static <T,R> [https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/List.html List]<[https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/concurrent/Future.html Future]<R>> [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/fj/FJ.html#fork_loop(java.lang.Iterable,fj.api.TaskFunction) fork_loop]([https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/lang/Iterable.html Iterable]<T> iterable, [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/fj/api/TaskFunction.html TaskFunction]<T,R> function)</code> | ||
+ | <code>public static <T,R> [https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/concurrent/Future.html Future]<R>[] [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/fj/FJ.html#fork_loop(java.lang.Object[],fj.api.TaskFunction) fork_loop](T[] array, [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/fj/api/TaskFunction.html TaskFunction]<T,R> function)</code> | ||
+ | |||
+ | <p>The first version of <code>fork_loop</code> returns a <code>List<Future<R>></code>, a <code>List</code> of <code>Futures</code> of type <code>R</code>. It takes in an <code>Iterable<T></code> and a <code>TaskFunction<T,R></code>. Following the link for the <code>TaskFunction<T, R></code> reveals that it is a function that takes in a parameter of type <code>T</code> and returns a value of type <code>R</code>. So for each of the elements of the given <code>Iterable<T></code>, a parallel call is made to the given function with that element passed in as the parameter. The second version does much the same, except it operates using arrays.</p> | ||
+ | |||
+ | {{CollapsibleCode|join_fork list example |<pre>public List<Integer> parallel_double(List<ScaryType<Integer>[][]> list) { | ||
+ | List<Future<ScaryType<Integer>[][]>> myFuture = join_fork(list, (element) -> { | ||
+ | /* element is of type ScaryType<Integer>[][] */ | ||
+ | /* Parallel task */ | ||
+ | }); | ||
+ | |||
+ | return join(myFuture); | ||
+ | }</pre>}} | ||
+ | <p>Don't be afraid of the generics! The element type of a list is just whatever is within the angled brackets.</p> | ||
+ | |||
+ | {{CollapsibleCode|join_fork array example |<pre>public List<Integer> parallel_double(ScaryType<Integer>[][] array) { | ||
+ | Future<ScaryType<Integer>[]>[] myFuture = join_fork(array, (element) -> { | ||
+ | /* element is of type ScaryType<Integer>[] */ | ||
+ | /* Parallel task */ | ||
+ | }); | ||
+ | |||
+ | return join(myFuture); | ||
+ | }</pre>}} | ||
+ | <p>Don't be afraid of multiple arrays. The element of an array is just the type if you ignore the last pair of braces. If you ever forget, ask yourself "What do I get when I index into this array." That is an element.</p> | ||
+ | |||
+ | <p>These versions of <code>fork_loop</code> have a lot in common with a for each loop, and thus comes with many of the same programming considerations. If you need to index into an array, this might not be your best choice. But if you are simply iterating through some kind of collection, these could be a great choice.</p> | ||
+ | |||
+ | == Fork Loop with Index == | ||
+ | <p><code>fork_loop_with_index</code> is like a combination of the iterable <code>fork_loop</code> and the indexed <code>fork_loop</code></p> | ||
+ | |||
+ | <p>It takes in an <code>Iterable<T></code> and a <code>TaskFunctionWithIndex<T,R></code>. Looking at the documentation for a <code>TaskFunctionWithIndex<T,R></code>, we can see that it takes in two parameters, the first being an argument of type <code>T</code> and the second being an integer value, and returns a value of type <code>R</code>. It supplies an index along with each element of the <code>Iterable<T></code>. Use this function if you want to index while iterating across an entire <code>Iterable<T></code> without having to specify <code>min</code> nor <code>maxExclusive</code>.</p> |
Latest revision as of 07:19, 10 February 2023
Contents
Overview
This page will walk you through reading documentation and understanding what the functions do in a practical manner. There is an assumption that you have attended class and watched the lectures, so the terms are familiar, but there is trouble knowing what everything means/does.
Fork
Reading the Documentation
When we look at the docs, we are greeted with the following:
public static <T> Future<T> fork(TaskSupplier<T> task)
What does this mean? Let's break it down
Static
static
means that the function is related to the class, FJ
as opposed to a specific instance of that class. This means that we can call FJ.fork
directly, we don't need to create an instance of that class to do so. In our code, we typically just write fork
because Professor Cosgrove has imported the appropriate dependencies for you! In fact, this entire library is static, so you should just be calling these functions by their function name.
Lambda Functions
We see that fork
takes in one parameter of type TaskSupplier<T>
. If we follow the link, we see that a TaskSupplier<T>
is just a single function, which has a return type T
and can throw InterruptedException
and ExecutionException
.
T get() throws InterruptedException,ExecutionException
Thus, any function that takes in no parameters and returns anything qualifies as a TaskSupplier<T>
. In code for this course, you will often see code such as:
Lambda function |
---|
public int count(/* parameters */) { Future<Integer> firstHalfFuture = fork(() -> { /* Parallel task */ }); /* Main task */ int firstHalf = join(firstHalfFuture); /* more code */ } |
This is the same as:
Explicit function |
---|
public int myFunc() { /* Parallel task */ } public int count(/* parameters */) { Future<Integer> firstHalfFuture = fork(() -> myFunc()); /* Main task */ int firstHalf = join(firstHalfFuture); /* more code */ } |
In the first case, we defined the function inline at the fork
method call, while the second case is more akin to what you may be used to.
Generics
You may have the question, what is a value of type T
? <T>
means that we have a generic type called T
. T
isn't any one type in particular, but instead is just a guarantee that in the following code, whenever we see T
, it is referring to the same type. An example you may be more familiar with is the ArrayList<E>
class. If we look at the documentation for ArrayList<E>
, we see, among other things:
public class ArrayList<E>
public boolean add(E e)
public E get(int index)
In the case of ArrayList<E>
, E
is the name of the generic type. And thus, everywhere we see E
, it is the same type E
as was supplied to the class constructor. That is why when you create an ArrayList<Integer>
, you are able to use it for Integer
, but not some other type such as String
. Then, when you create an ArrayList<String>
, you are able to use it for String
, but not other types such as Integer
.
Coming back to the fork
function, we next see that the return type is Future<T>
. We can follow the link to the java documentation for Future
to see that when we call Future.get()
, it will return a value of type T
. Generally, in this course, we use T
when referring to the type of the data that is passed into the parallel task, and R
when referring to the type of the data that is returned by the parallel task.
Practical Explanation
So putting all of this together, the fork
function is a static
function that is tied to the FJ
class, and not any specific instance of that class. It takes in a function that returns any type T
, and itself returns a Future<T>
object, which will return a value of that same type T
when the get
function is called on it. The function that is passed into the fork
function is run in parallel to the main process. Thus, code in the following structure leads to a parallel program:
Fork template |
---|
// Parallel task and Main task will run in parallel to one another Future<SomeType> myFuture = fork(() -> { /* Parallel task */ }); /* Main task */ SomeType myValue = join(myFuture); } |
Join
The various join
functions gets the value of the Future
or multiple Future
objects passed in and returns the value(s) in a convenient manner. You can pass in any number of Future<R>
objects, collections of Future<R>
, arrays of Future<R>
and in the cases that you passed in multiple values, join can return either a list or an array, so most of the heavy-lifting should be taken care of by the compiler. Here are some general reminders/pointers regarding the join
method. All join
methods are a blocking methods, so if the value isn't ready, the program just sits there. <insert this is fine dog> Also, all join
methods may throw InterruptedException
or ExecutionException
. You may notice that these are the same errors as are thrown in the get
method of Future<R>
...
Fork Loop
1 down, 26 to go! Don't worry, things will get faster now as there is less to explain.
Fork-Looping with Indices
The first fork_loop
we are going to explore has the following signature:
public static <R> Future<R>[] fork_loop(int min, int maxExclusive, TaskIntFunction<R> function)
This version of fork_loop
as a whole returns a, Future<R>[]
, an array of Future<R>
. It takes in two integers, min
and maxExclusive
, then a TaskIntFunction<R>
. Following the link for the TaskIntFunction<R>
reveals that it is a function that has one parameter of type int
and returns a value of type R
. Under the hood, the fork_loop
function creates one fork, one parallel branch, for each integer in the range [min, maxExclusive)
, and passes to each branch its corresponding integer. Then, when the values return from the branches, the join_fork
loop bundles them back up in an array, such that the return value of the min
th branch is the 0th element in the returned array, and the return value of the maxExclusive
th branch is the last element of the returned array. Isn't that handy? This join_fork
function is very similar to a "standard" for loop for(int i = min; i < maxExclusive; i++)
, just parallel. An example use case of this fork_loop
is as follows:
join_fork example |
---|
public int[] parallel_double(int[] array) { Future<Integer>[] updatedFuture = join_fork(0, array.length, (index) -> { return array[index] * 2; }); return join(updatedFuture); } |
So parallel_double(input)[0] = input[0] * 2
, parallel_double(input)[1] = input[1] * 2
, parallel_double(input)[2] = input[2] * 2
, etc. Notice that we did not have to declare the type of index
. This is because the definition of join_fork
defines the third parameter, the lambda function, as a TaskIntFunction<R>
, which we already know takes in a value of type int
.
Fork-Looping with Iterables and Arrays
We will now explore the following versions of fork_loop
:
public static <T,R> List<Future<R>> fork_loop(Iterable<T> iterable, TaskFunction<T,R> function)
public static <T,R> Future<R>[] [,fj.api.TaskFunction) fork_loop](T[] array, TaskFunction<T,R> function)
The first version of fork_loop
returns a List<Future<R>>
, a List
of Futures
of type R
. It takes in an Iterable<T>
and a TaskFunction<T,R>
. Following the link for the TaskFunction<T, R>
reveals that it is a function that takes in a parameter of type T
and returns a value of type R
. So for each of the elements of the given Iterable<T>
, a parallel call is made to the given function with that element passed in as the parameter. The second version does much the same, except it operates using arrays.
join_fork list example |
---|
public List<Integer> parallel_double(List<ScaryType<Integer>[][]> list) { List<Future<ScaryType<Integer>[][]>> myFuture = join_fork(list, (element) -> { /* element is of type ScaryType<Integer>[][] */ /* Parallel task */ }); return join(myFuture); } |
Don't be afraid of the generics! The element type of a list is just whatever is within the angled brackets.
join_fork array example |
---|
public List<Integer> parallel_double(ScaryType<Integer>[][] array) { Future<ScaryType<Integer>[]>[] myFuture = join_fork(array, (element) -> { /* element is of type ScaryType<Integer>[] */ /* Parallel task */ }); return join(myFuture); } |
Don't be afraid of multiple arrays. The element of an array is just the type if you ignore the last pair of braces. If you ever forget, ask yourself "What do I get when I index into this array." That is an element.
These versions of fork_loop
have a lot in common with a for each loop, and thus comes with many of the same programming considerations. If you need to index into an array, this might not be your best choice. But if you are simply iterating through some kind of collection, these could be a great choice.
Fork Loop with Index
fork_loop_with_index
is like a combination of the iterable fork_loop
and the indexed fork_loop
It takes in an Iterable<T>
and a TaskFunctionWithIndex<T,R>
. Looking at the documentation for a TaskFunctionWithIndex<T,R>
, we can see that it takes in two parameters, the first being an argument of type T
and the second being an integer value, and returns a value of type R
. It supplies an index along with each element of the Iterable<T>
. Use this function if you want to index while iterating across an entire Iterable<T>
without having to specify min
nor maxExclusive
.