Studio 6: MergeSort and HeapSort

Introduction
Part A: Merging Two Sorted Arrays
Part B: Performance of MergeSort
Part C: Space Requirement of MergeSort vs HeapSort

Introduction

We briefly discussed two sorting algorithms this week -- MergeSort and HeapSort. This studio will take a closer look at these important algorithms, especially MergeSort.

Part A: Merging Two Sorted Arrays

The basic idea of MergeSort is as follows: given an unsorted array $A[1..n]$,

divide $A$ into two halves, say $A[1..n/2]$ and $A[n/2+1..n]$;
recursively sort each half;
merge the two sorted halves into a sorted whole.

Let's focus on the last step. Assume that you are given sorted arrays $A[1..n]$ and $B[1..m]$, and you want to merge these into an (initially empty) array $C[1..n+m]$ so that after merging, $C$ is sorted and contains every element of $A$ and $B$.

Here's a sketch of the algorithm: maintain indices $i$, $j$, and $k$ that point into arrays $A$, $B$, and $C$ respectively. $A[i]$ and $B[j]$ are the next elements from each input array that need to be copied to $C$, while $C[k]$ is the first free space in $C$ to which an element can be written. Repeatedly copy the smaller of $A[i]$ and $B[j]$ to $C$ and increment the source and destination indices of the copy.

Question A1: based on this sketch, write pseudocode to merge $A$ and $B$ into $C$, maintaining the three indices described above.

To make your life easier, you may assume that the arrays $A$ and $B$ are followed by sentinel elements $A[n+1]$ and $B[m+1]$. These sentinels are both $\infty$, which is larger than any value in either $A$ or $B$, and should not be copied to $C$.

Question A2: Try to convince your TA informally that the resulting code is plausibly correct. Illustrate its behavior on a pair of small input arrays (say, of size 4 or 5).

How can you verify more carefully that your merge algorithm is right? Your code probably does some work in a loop. Just as for binary search last week, you can establish a loop invariant and a progress condition that together imply correctness.

Question A3: is there an expression involving indices $i$, $j$ that is expected to increase on every pass through the loop? Using this expression, formulate a progress condition and use it to argue that your algorithm eventually terminates.

A reasonable loop invariant might be that the prefix of $C$ written so far contains all the elements read from some prefixes of $A$ and $B$, in sorted order.

Question A4: state this invariant precisely and prove inductively that it holds after any number of loop iterations.

Exactly how you formulate the invariant will depend on how your code increments $i$ and $j$. All of $i$, $j$, and $k$ should appear in its statement.

Hint: make sure your invariant holds before any loop iterations have occurred. In your proof, you may want to take advantage of the fact that $A[1..0]$ formally corresponds to an empty array.

Question A5: argue that your loop invariant plus your progress condition together imply that the final array $C$ is correctly sorted and contains exactly the elements of $A$ and $B$.

Discuss your proof with your TA when you think you've got it done, if not before.

Part B: Performance of MergeSort

Given the merging procedure Merge(A,B,C) that you designed above, we can now describe the full MergeSort algorithm as follows:

MergeSort(A[1..n])
  if (n <= 1)
    return A
  else
    Alo <- A[1..n/2]
    Ahi <- A[n/2+1..n]
    Slo <- MergeSort(Alo)
    Shi <- MergeSort(Ahi)
    D <- a new array of size n
    Merge(Slo,Shi,D)
    return D

The three arguments to the Merge function correspond to the $A$, $B$, and $C$ arrays from our previous analysis.

Question B1: what is the asymptotic complexity of your Merge operation, applied to two arrays each of size $n/2$?

Question B2: write a recurrence for the running time of MergeSort on inputs of size $n$, using the code above and the complexity of Merge you inferred in B1.

Question B3: solve your recurrence by the method of your choice to get the asymptotic complexity of MergeSort. Is the recurrence balanced?

The MergeSort algorithm divides the input array into two halves. We could equally well divide it into three thirds, four quarters, or even more pieces. Does this impact performance?

Question B4: formulate a new recurrence for a version of MergeSort that splits its input array into $k$ equally-sized pieces and recurs on each piece.

You may assume that it is possible to merge $k$ sorted subarrays of size $n/k$ in the same asymptotic time needed to merge two subarrays of size $n/2$. You may wish to think about how to do such a $k$-way merge in a style similar to the code you wrote for Part A.

Question B5: Solve your new recurrence by the method of your choice. How did dividing the array into more pieces impact your algorithm's asymptotic complexity?

Part C: Space Requirement of MergeSort vs HeapSort

We now have two efficient sorting algorithms: MergeSort and HeapSort. Each algorithm has its own advantages that make it more attractive for some uses.

One disadvantage of MergeSort is that the merge step needs separate input and output arrays. That is, when we merge from $A$ and $B$ into $C$, we can't store $C$ in the same memory as $A$ and $B$, because writing into $C$ might overwrite our input data before we finish using it!

Question C1: every time it calls Merge, MergeSort allocates a new array big enough to hold the result of the merge. How much total space will be allocated when sorting an array of size $n$? (Formulate a recurrence and get an asymptotic answer.)

Question C2: can you think of a way to modify MergeSort so that it uses only $O(n)$ memory (in addition to the input array) to sort an array of size $n$?

Hint: once you have sorted two subarrays of size $m$ into an array of size $2m$, do you need the arrays of size $m$ anymore?

While we can reduce the amount of extra memory used by MergeSort, there is no way (well, no obvious way) to make it truly in-place. An in-place sorting algorithm overwrites its input array with the sorted version of the array and uses only $O(1)$ extra memory besides that occupied by the input array.

Question C3: what advantages might an in-place sorting algorithm have over one that needs $\Theta(n)$ extra memory?

In contrast to MergeSort, HeapSort can easily be implemented in place by having the input array, the intermediate heap, and the output array all share the same space.

Question C4: describe how to convert an array $A[1..n]$ into a heap without allocating an array separate from $A$ to hold this heap.

Hint: grow your heap from the start of $A$ using successive insertions.

Question C5: now suppose that the heap you created in C4 is max-first. Describe how to use $n$ extractions to create a sorted array from the heap, without allocating a separate array to hold the output.

Hint: where in the sorted version of $A$ should you write the value extracted first from the heap? Is there room to do so?