Understanding Fork Join

Overview

This page will walk you through reading documentation and understanding what the functions do in a practical manner. There is an assumption that you have attended class and watched the lectures, so the terms are familiar, but there is trouble knowing what everything means/does.

Fork

Reading the Documentation

When we look at the docs, we are greeted with the following:

public static <T> Future<T> fork(TaskSupplier<T> task)

What does this mean? Let's break it down

Static

static means that the function is related to the class, FJ as opposed to a specific instance of that class. This means that we can call FJ.fork directly, we don't need to create an instance of that class to do so. In our code, we typically just write fork because Professor Cosgrove has imported the appropriate dependencies for you! In fact, this entire library is static, so you should just be calling these functions by their function name.

Lambda Functions

We see that fork takes in one parameter of type TaskSupplier<T>. If we follow the link, we see that a TaskSupplier<T> is just a single function, which has a return type T and can throw InterruptedException and ExecutionException.

T get() throws InterruptedException,ExecutionException

Thus, any function that takes in no parameters and returns anything qualifies as a TaskSupplier<T>. In code for this course, you will often see code such as:

Lambda function
public int count(/* parameters /) { Future<Integer> firstHalfFuture = fork(() -> { / Parallel task / }); / Main task / int firstHalf = join(firstHalfFuture); / more code */ }

This is the same as:

Explicit function

public int myFunc() {
    /* Parallel task */
}

public int count(/* parameters */) {
    Future<Integer> firstHalfFuture = fork(() -> myFunc());
    /* Main task */
    int firstHalf = join(firstHalfFuture);
    /* more code */
}

In the first case, we defined the function inline at the fork method call, while the second case is more akin to what you may be used to.

Generics

You may have the question, what is a value of type T? <T> means that we have a generic type called T. T isn't any one type in particular, but instead is just a guarantee that in the following code, whenever we see T, it is referring to the same type. An example you may be more familiar with is the ArrayList<E> class. If we look at the documentation for ArrayList<E>, we see, among other things:

public class ArrayList<E>

public boolean add(E e)

public E get(int index)

In the case of ArrayList<E>, E is the name of the generic type. And thus, everywhere we see E, it is the same type E as was supplied to the class constructor. That is why when you create an ArrayList<Integer>, you are able to use it for Integer, but not some other type such as String. Then, when you create an ArrayList<String>, you are able to use it for String, but not other types such as Integer.

Coming back to the fork function, we next see that the return type is Future<T>. We can follow the link to the java documentation for Future to see that when we call Future.get(), it will return a value of type T. Generally, in this course, we use T when referring to the type of the data that is passed into the parallel task, and R when referring to the type of the data that is returned by the parallel task.

Practical Explanation

So putting all of this together, the fork function is a static function that is tied to the FJ class, and not any specific instance of that class. It takes in a function that returns any type T, and itself returns a Future<T> object, which will return a value of that same type T when the get function is called on it. The function that is passed into the fork function is run in parallel to the main process. Thus, code in the following structure leads to a parallel program:

Fork template
// Parallel task and Main task will run in parallel to one another Future<SomeType> myFuture = fork(() -> { /* Parallel task / }); / Main task */ SomeType myValue = join(myFuture); }

Join

The various join functions gets the value of the Future or multiple Future objects passed in and returns the value(s) in a convenient manner. You can pass in any number of Future<R> objects, collections of Future<R>, arrays of Future<R> and in the cases that you passed in multiple values, join can return either a list or an array, so most of the heavy-lifting should be taken care of by the compiler. Here are some general reminders/pointers regarding the join method. All join methods are a blocking methods, so if the value isn't ready, the program just sits there. <insert this is fine dog> Also, all join methods may throw <InterruptedException> or <ExecutionException>. You may notice that these are the same errors as are thrown in the get method of Future<R>...

Fork Loop

1 down, 26 to go! Don't worry, things will get faster now as there is less to explain.

Forking by Index

The first fork_loop we are going to explore has the following signature:

public static <R> Future<R>[] fork_loop(int min, int maxExclusive, TaskIntFunction<R> function)

This version of fork_loop as a whole returns a, Future<R>[], an array of Future<R>. It takes in two integers, min and maxExclusive, then a TaskIntFunction<R>. Following the link for the TaskIntFunction<R> reveals that it is a function that has one parameter of type int and returns a value of type R. Under the hood, the fork_loop function creates one fork, one parallel branch, for each integer in the range [min, maxExclusive), and passes to each branch its corresponding integer. Then, when the values return from the branches, the join_fork loop bundles them back up in an array, such that the return value of the minth branch is the 0th element in the returned array, and the return value of the maxExclusiveth branch is the last element of the returned array. Isn't that handy? This join_fork function is very similar to a "standard" for loop for(int i = min; i < maxExclusive; i++), just parallel. An example use case of this fork_loop is as follows:

join_fork example
public int[] parallel_double(int[] array) { Future<Integer> updatedFuture = join_fork(0, array.length, (index) -> { return array[index] * 2; }); return join(updatedFuture); }

So parallel_double(input)[0] = input[0] * 2, parallel_double(input)[1] = input[1] * 2, parallel_double(input)[2] = input[2] * 2, etc. Notice that we did not have to declare the type of index. This is because the definition of join_fork defines the third parameter, the lambda function, as a TaskIntFunction<R>, which we already know takes in a value of type int.

Forking with Iterables

The next fork_loop we are going to explore has the following signature:

public static <T,R> List<Future<R>> fork_loop(Iterable<T> iterable, TaskFunction<T,R> function)

This version of fork_loop returns a List<Future<R>>, a List of Futures of type R. It takes in an Iterable<T> and a TaskFunction<T,R>. Following the link for the TaskFunction<T, R> reveals that it is a function that takes in a parameter of type T and returns a value of type R.

join_fork example
public List<Integer> parallel_double(int[] array) { Future<Integer> updatedFuture = join_fork(0, array.length, (index) -> { return array[index] * 2; }); return join(updatedFuture); }

Understanding Fork Join

Contents

Overview

Fork

Reading the Documentation

Static

Lambda Functions

Generics

Practical Explanation

Join

Fork Loop

Forking by Index

Forking with Iterables

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

General

Exercises and Warmups

Tools