Quicksort Assignment

From CSE231 Wiki
Jump to navigation Jump to search

Motivation

Sorting is a problem that is well solved by divide and conquer algorithms. Quicksort is an oldie but a goodie. Developed in 1959 by Tony Hoare, it is still the fastest sort after all these years. As a bonus, the conquer portion can be easily parallelized. The divide portion can also be parallelized, although not trivially (which left as an optional fun challenge for those so inclined).

Background

In computer science, quicksort is a sorting algorithm which selects a pivot value, partitions all the values less than the pivot value on one side and the values greater on the other. By recursively performing this operation on the ever decreasing range lengths on the left and right, the array is sorted when the range length reaches a size of 1.

For more information on how this process works, visit the wikipedia page on quicksort.


Video: Parallel Quicksort  

Visualization

If you are unclear on how merge sort works, take a look at the visualgo explanation and visualization of quicksort.

Visalgo quicksort.png

Work and Span

Partition does the work of quicksort. The top-level partition must do N work. The next level does two ~N/2 partitions. The level below that performs four ~N/4 partitions. And so on. The table is wide and high. Thus the work of quicksort is .

For N=16:

<- - - - - - - - - - - - - - N - - - - - - - - - - - - - ->
^ partition [0, 16)
| partition [0, 8) partition [8, 16)
lg N partition [0, 4) partition [4, 8) partition [8, 12) partition [12, 16)
| partition [0, 2) partition [2, 4) partition [4, 6) partition [6, 8) partition [8, 10) partition [10, 12) partition [12, 14) partition [14, 16)
v base base base base base base base base base base base base base base base base

The span for the classic sequential partition is one partition of each of the levels (highlighted in blue). This works out to be the geometric series:

which is:

Classic parallel merge sort and quicksort span.svg

As a challenge problem, we parallelize the partition step, resulting in a span for quicksort of an outstanding .

Mistakes To Avoid

Attention niels epting.svg Warning: Be sure to invoke the Partitioner provided to the Sorter's constructor's partitionRange method. Do not implement SequentialPartioner and then re-implement it in the Sorters.
Attention niels epting.svg Warning: When checking the base case, remember to account for [minInclusive, maxExclusive). It is all too easy to get an off by 1 error and stack overflow.
Attention niels epting.svg Warning: When transitioning to sequential, make sure to NOT sort the entire array when you are only responsible for a range [min,max).

WarmUp

The JoinAll WarmUp prepares you for getting around with ConcurrentLinkedQueue's weakly consistent iterators. It is highly recommended.

Client

class: QuicksortClient.java CLIENT
package: sort
source folder: student/src/sort.quick.client/java

Note: Quicksort is not stable. You can see that after sorting by name, this is not preserved when sorting by length. For example, the 3 of length 4 come out Hera, Ares, Zeus.

QuicksortClient  
// https://en.wikipedia.org/wiki/Twelve_Olympians
String[] olympians = {
        "Zeus",
        "Hera",
        "Athena",
        "Apollo",
        "Poseidon",
        "Ares",
        "Artemis",
        "Demeter",
        "Aphrodite",
        "Dionysus",
        "Hermes",
        "Hephaestus",
};

Shuffler<String> shuffler = new LinearShuffler<>(new Random(System.currentTimeMillis()));
Partitioner<String> partitioner = new SequentialPartitioner<>();
IntPredicate isParallelDesired = (rangeLength) -> rangeLength > 3;
Sorter<String> sequentialQuicksorter = new SequentialQuicksorter<>(shuffler, partitioner);

Sorter<String> parallelQuicksorter = new ParallelQuicksorter<>(shuffler, partitioner, isParallelDesired, sequentialQuicksorter);

System.out.println("        original: " + Arrays.toString(olympians));

parallelQuicksorter.sort(olympians, Comparator.naturalOrder());
System.out.println("  sorted by name: " + Arrays.toString(olympians));

Comparator<String> lengthComparator = (a, b) -> b.length() - a.length();
parallelQuicksorter.sort(olympians, lengthComparator);
System.out.println("sorted by length: " + Arrays.toString(olympians));
QuicksortClient Output  
        original: [Zeus, Hera, Athena, Apollo, Poseidon, Ares, Artemis, Demeter, Aphrodite, Dionysus, Hermes, Hephaestus]
  sorted by name: [Aphrodite, Apollo, Ares, Artemis, Athena, Demeter, Dionysus, Hephaestus, Hera, Hermes, Poseidon, Zeus]
sorted by length: [Hephaestus, Aphrodite, Dionysus, Poseidon, Artemis, Demeter, Athena, Hermes, Apollo, Hera, Ares, Zeus]

Code to Investigate

Index Selection

InitialIndexSelector<T>

The partition step of quicksort starts by selecting an index in an array to serve as the pivot value.

public interface InitialIndexSelector<T> {
	int selectInitialIndex(T[] data, Comparator<T> comparator, int min, int maxExclusive);
}

RandomInitialIndexSelector<T>

Randomly selecting the initial index is the go to strategy to achieve provable runtime performance.

public final class RandomInitialIndexSelector<T> implements InitialIndexSelector<T> {
	private final Random random;

	public RandomInitialIndexSelector(Random random) {
		this.random = random;
	}

	@Override
	public int selectInitialIndex(T[] data, Comparator<T> comparator, int min, int maxExclusive) {
		int range = maxExclusive - min;
		return min + random.nextInt(range);
	}
}

Partition Pivot Location Result

PivotLocation

The partition step of quicksort... well... partitions the array. After a pivot value is selected, partition will not necessarily sort the array, but all of the values less than the pivot value will be in the lower part of the range and all of the values greater than the pivot value will be in the higher part of the array.

There are several different approaches to partition and there can be different pivot location characteristics. Another approach does even place the pivot value itself if at the pivot location and therefore the range of the pivot location is 0 in length! Another approach places the single value at the selected index at the pivot location so the range length is 1. Finally, one partition approach addresses the Dutch National Flag Problem by placing all values equal to the pivot value together in the center, therefore the range length of the pivot location can be >= 1.

To account for this variable pivot location range length, a class PivotLocation is provided:

public final class PivotLocation {
	private final int maxExclusiveOfLesserSide;
	private final int minOfGreaterSide;
	public PivotLocation(int maxExclusiveOfLesserSide, int minOfGreaterSide) {
		this.maxExclusiveOfLesserSide = maxExclusiveOfLesserSide;
		this.minOfGreaterSide = minOfGreaterSide;
	}
	public int maxExclusiveOfLesserSide() {
		return this.maxExclusiveOfLesserSide;
	}
	public int minOfGreaterSide() {
		return this.minOfGreaterSide;
	}
}

If a partitioner is called upon to partition a range [min, maxExclusive) it will return a Pivot Location [maxExclusiveOfLesserSide, minOfGreaterSide). The elements in [min, pivotLocation.maxExclusiveOfLesserSide) will be less than the pivot value, the elements in [pivotLocation.maxExclusiveOfLesserSide, pivotLocation.minOfGreaterSide) will be equal to the pivot value, and the elements in [pivotLocation.minOfGreaterSide, max) will be greater than the pivot value.

Code to Implement

Both the sequential and parallel quicksorts will be passed a Partitioner. Partitioning occurs during the divide phase of divide-and-conquer of the general recursive case.

You will need to implement quicksort sequentially and in parallel. Both implementations will naturally be recursive. The kernel method should call itself using recursion, but each public quicksort method should only call its kernel once to do the work.

Partition

Partition performs the grand majority of the work in quicksort. A sequential partitioner is required. A parallel partitioner is left as a challenge problem. Thus, there are separate classes to implement.

AbstractPartitioner

AbstractPartitioner simply keeps track of the InitialIndexSelector.

class: AbstractPartitioner.java Java.png
methods: constructor
initialIndexSelector
package: sort.quick.exercise
source folder: student/src/main/java

constructor and instance variable

Simply hang onto the specified initialIndexSelector in an instance variable for later use.

public AbstractPartitioner(InitialIndexSelector<T> initialIndexSelector) {
	throw new NotYetImplementedException();
}

initialIndexSelector

Return the previously specified to the constructor initialIndexSelector.

public InitialIndexSelector<T> initialIndexSelector() {
	throw new NotYetImplementedException();
}

SequentialPartitioner

class: SequentialPartitioner.java Java.png
methods: partitionRange
package: sort.quick.exercise
source folder: student/src/main/java

partitionRange

method: public PivotLocation partitionRange(T[] data, Comparator<T> comparator, int min, int maxExclusive) Sequential.svg (sequential implementation only)

There are many approaches to partition with different performance characteristics.

Requirements:
Green check.svgAll values left of the returned PivotLocation must be less than or equal to the value at the PivotLocation
Green check.svgAll values right of the returned PivotLocation must be greater than or equal to the value at the PivotLocation.
Green check.svgAll values at the Pivot Location must be equal.
Green check.svgThe pivot location must divide the problem into smaller sub-problems.
Attention niels epting.svg Alert:The HackerRank approach from the prep does not maintain the pivot value at the PivotLocation.
Thus the range between the pivot location's maxExclusiveOfLesserSide and minOfGreaterSide will be 0 in length.
As a result, care must be taken to ensure that partition produces a pivot such that there are always smaller sub-problems.
Attention niels epting.svg Alert:The HackerRank approach from the prep chooses the value at the midpoint as its pivot value.
Critically, it calculates the midpoint based on the *inclusive* maximum.
Believe it or not, the entire algorithm death spirals if you choose the midpoint based on the exclusive max.

Although an interesting challenge, you are not required handle the Dutch national flag problem.

Sort

interface Sorter

class: Sorter.java Java.png
methods: sort
package: sort.core
source folder: student/src/main/java

sort

method: default void sort(T[] data, Comparator<T> comparator) Sequential.svg (sequential implementation only)

This should be already complete from the Merge Sort Exercise.


class SequentialQuicksorter

class: SequentialQuicksorter.java Java.png
methods: sortRange
package: sort.quick.exercise
source folder: student/src/main/java
public class SequentialQuicksorter<T> extends AbstractQuicksorter<T>

constructor and instance variable

Simply hang onto the specified partitioner in an instance variable for later use.

public SequentialQuicksorter(Partitioner<T> partitioner) {
	throw new NotYetImplementedException();
}

partitioner

Return the previously specified to the constructor partitioner

public Partitioner<T> partitioner() {
	throw new NotYetImplementedException();
}

sortRange

method: public void sortRange(T[] data, Comparator<T> comparator, int min, int maxExclusive) Sequential.svg (sequential implementation only)

Implement the classic divide and conquer sequential algorithm here.

Attention niels epting.svg Alert:Be sure to leverage the Partitioner passed to the constructor to perform the divide step.

class ParallelQuicksorter

class: ParallelQuicksorter.java Java.png
methods: partitioner
isParallelDesired
sequentialSorter
sortShuffledRangeHelper
sortShuffledRange
package: sort.quick.exercise
source folder: student/src/main/java

Adapt your sequential algorithm to take advantage of what can be parallelized.

What can be in parallel?

What is dependent on the parallel tasks completing?

Attention niels epting.svg Warning:Leverage the provided private final IntPredicate isParallelDesired; field to test whether or not parallelism is warranted for the current range length.
Attention niels epting.svg Warning:Fall back to the provided private final Sorter<T> sequentialSorter; field when the isParallelDesired predicate indicates that parallelism is not longer warranted.

constructor and instance variables

Hang onto each of the specified parameters in instance variables for later use.

public ParallelQuicksorter(Partitioner<T> partitioner, IntPredicate isParallelDesired, Sorter<T> sequentialSorter) {
	throw new NotYetImplementedException();
}

partitioner

method: public Partitioner<T> partitioner() Sequential.svg (sequential implementation only)

isParallelDesired

method: public IntPredicate isParallelDesired() Sequential.svg (sequential implementation only)

sequentialSorter

method: public Sorter<T> sequentialSorter() Sequential.svg (sequential implementation only)

sortShuffledRangeHelper

method: private void sortShuffledRangeHelper(Queue<Future<Void>> futures, T[] data, Comparator<T> comparator, int min, int maxExclusive) Parallel.svg (parallel implementation required)

This method implements the essence of quicksort.

Test this isParallelDesired predicate on the range length first. If it passes, invoke the partitioner to perform the majority of the work followed by appropriately parallel recursive tasks. If the predicate does not pass the test, simply invoke the sequentialSorter.

Requirement:
Green check.svgDo not invoke the partitioner if the predicate fails the test. Let the sequential sorter handle it from here.
Requirements:
Green check.svgAll forked tasks must be added to the futures queue.
Green check.svgDo not join the tasks in this method. Joining is the responsibility of the sortShuffledRange method.

sortShuffledRange

This method is partially provided. Invokes sortShuffledRangeHelper on the complete range, and passes an additional ConcurrentLinkedQueue to capture the Futures associated with any and all forked tasks.

@Override
protected void sortShuffledRange(T[] data, Comparator<T> comparator, int min, int maxExclusive) throws InterruptedException, ExecutionException {
	Queue<Future<Void>> futures = new ConcurrentLinkedQueue<>();
	sortShuffledRangeHelper(futures, data, comparator, min, maxExclusive);

	throw new NotYetImplementedException();

}

When sortShuffledRangeHelper is complete, each task in the queue must be joined. Since ConcurrentLinkedQueue's iterators are weakly consistent, and all of the tasks may not yet be in the queue, we must join all the tasks as in the Join All warmup.

Attention niels epting.svg Alert:do not call sortShuffledRange recursively. sortShuffledRangeHelper should do the recursive work.

Challenge

You can divide and conquer the divide step in quicksort. The pan can be improved from down to .

For details on how to complete this challenge, check out: Parallel Partition

Testing Your Solution

class: __QuicksorterTestSuite.java Junit.png
package: sort.quick.exercise
source folder: testing/src/test/java

Pledge, Acknowledgments, Citations

file: quicksort-pledge-acknowledgments-citations.txt

More info about the Honor Pledge