Studio 14: Shortest Paths, MST, and Greed

Part A: Practice with Dijkstra's and Prim's Algorithms
Part B: Proving Correctness in the Greedy Choice Paradigm
Part C: Greedy Choice in Other Contexts
Part D: Greed Isn't Always Optimal

This week's studio will take a closer look at greedy algorithms.

Part A: Practice with Dijkstra's and Prim's Algorithms

Consider the following directed graph:

graph for BFS and DFS

First, run Dijkstra's algorithm on this graph, starting from vertex A. If you ever have to choose between vertices at a given step, pick the one that comes first in the alphabet.

Question A1: what are the distances assigned by Dijkstra's algorithm to each vertex?

Question A2: which edges are present in the resulting shortest-path tree?

Now pretend that the graph is undirected (that is, ignore the arrows on the edges), and run Prim's algorithm on the graph. Again, if you ever have to choose between vertices at a given step, pick the one that comes first in the alphabet.

Question A3: what is the MST computed by the algorithm, and what is its total weight?

Part B: Proving Correctness in the Greedy Choice Paradigm

We saw in class that Prim's algorithm is an example of a greedy algorithm: it makes a sequence of choices, each optimizing a local criterion, that combine to yield a globally optimal solution. In this case, the algorithm builds up a set of edges $T$, and the greedy choice at each step is to pick the lowest-weight edge connecting a vertex already touched by to $T$ to one not yet touched.

How do we show that a greedy algorithm like Prim's is correct, that is, that it always yields an optimal solution (in this case, a minimum spanning tree)? The general strategy is to show that the following claim holds after any number of greedy choices:

The next greedy choice plus all previous choices are consistent with some optimal solution.

In other words, making the greedy choice at each step is never a bad idea because we can always find some optimal solution that makes both this choice and all of the previous ones. We can verify this claim inductively on the number of choices made.

Question B1: does proving that the claim holds after any number of choices imply that the algorithm always returns an optimal solution? What other property must be true for this implication to hold?

For Prim's algorithm, the structure of this proof was roughly as follows. An optimal solution is a minimum spanning tree for the input graph $G$, and we need to show that $T$, the set of edges chosen so far, is consistent with some such tree. Consistency means that the edges of $T$ all appear together in some MST $T^{*}$ for $G$; that is, $T \subseteq T^{*}$.

In the base case, $T$ is empty (no choices have been made), and $T$ is therefore a subset of any MST for $G$.
In the general case, suppose $e$ is the next greedy choice of edge. By the IH, $T$ is a subset of some MST $T^{*}$.
If $e \in T^{*}$, then the IH is proved for $T \cup \left\{ e \right\}$. Otherwise, we need to show that there is some other spanning tree $T'$ that contains $T \cup \left\{e\right\}$ and is just as good as $T^{*}$ -- that is, some other optimal solution!

This process of finding a different optimal solution that makes the next greedy choice is sometimes called an exchange argument, because we exchange the optimal solution $T^{*}$ that does not make the choice for a new optimal solution $T'$ that does.

The details of the exchange argument are specific to the problem. In this case, we saw that $T^{*}$ contains another edge $e'$ that connects the vertices touched by $T$ and those not touched by $T$. Adding $e$ to $T^{*}$ creates a cycle, which we can break by removing $e'$.

Question B2: why do we know that adding $e$ to $T^{*}$ creates exactly one cycle, rather than two or more?

Question B3: why do we know that removing $e'$ from $T^{*} \cup \left\{e\right\}$ leaves us with a tree $T'$?

Question B4: why do we know that the tree $T'$ spans $G$?

Since $w(e) \leq w(e')$, the new tree $T'$ is also an MST, and our exchange succeeded.

Part C: Greedy Choice in Other Contexts

Now that you've seen an exchange argument applied to prove the correctness of Prim's algorithm, you can apply the same principle to design and prove the correctness of other greedy algorithms as well.

First, let's look at Kruskal's MST algorithm, which we also saw in class. Kruskal's algorithm uses a different greedy choice: pick the lowest-weight edge $e$ connecting any two vertices in $G$, such that $T \cup \left\{e\right\}$ does not contain a cycle.

Let's use the same inductive strategy to prove the correctness of Kruskal's greedy choice. The base case is the same as for Prim's algorithm. In the general case, we once again assume that $T$ is a subset of some MST $T^{*}$, and consider what happens if the next edge $e$ chosen is not in $T^{*}$.

Question C1: finish the correctness proof for Kruskal's algorithm using an exchange argument similar to that for Prim's algorithm.

What is the analogy in Kruskal's algorithm to the two disjoint subsets of vertices connected by edge $e$ in Prim's algorithm? What is the analogy of the edge $e'$?

Now let's consider a completely different problem that is also amenable to the greedy approach. In the barge-loading problem, you have a barge that can hold up to $C$ tons of cargo without sinking, along with $m$ containers of weights $w_1, w_2, \ldots, w_m$ tons. The goal is to load as many of these containers as possible onto the barge without sinking it.

Question C2: give an intuitively appealing greedy choice for which container you would load first to maximize the total number of containers loaded.

Which container gives you the most potential freedom to add others later?

Discuss your greedy choice with your TA to make sure you have a workable choice before continuing!

Now, let's set up an inductive correctness proof to show that your greedy choice is consistent with an optimal solution, that is, one with the maximum possible number of containers loaded. We proceed by induction on the number of containers loaded.

Question C3: what is the base case for your proof?

Question C4: now write the exchange argument for the general case.

Remember, you get to assume that all chosen containers except the last are consistent with some optimal solution $T^{*}$. Follow the same overall structure as the previous proofs to describe a change to $T^{*}$ that accommodates the next container chosen.

You can use the principle of exchange to design and prove the correctness of greedy algorithms in many different problem domains, such as scheduling and linear algebra.

Part D: Greed Isn't Always Optimal

Let's look at another basic optimization problem, the knapsack problem. You have a knapsack with capacity $C$ kilograms and a set of $m$ items. The $i$th item weighs $w_i$ kilograms and is worth $v_i$ dollars. The goal is to add a subset of these items to the knapsack so that it contains the greatest total value while having weight $\leq C$. In other words, if $K$ is the subset of items in the knapsack, then we wish to maximize $\sum_{i \in K} v_i$ subject to the constraint that $\sum_{i \in K} w_i \leq C$.

Question D1: how is this problem related to the barge-loading problem?

A natural greedy choice for this problem is the following. To each item $i$, assign a density $d_i = v_i / w_i$ (the item's value per unit weight). At each step of the algorithm, add the item of highest density that still fits in the knapsack, until no more items can be added. The idea is that, since the total weight of the knapsack is limited, we should try to take items that are the most valuable for their weight.

Question D2: given your answer to D1, how is this greedy choice related to the greedy choice you defined for the barge-loading problem?

The question now is, does this greedy choice for the knapsack problem lead to an optimal algorithm?

Question D3: show via a small counterexample that the greedy algorithm does not always yield an optimal knapsack.

The knapsack problem is an example of an NP-optimization problem -- it is believed to be quite unlikely to have an algorithm to find an optimal solution that runs in time polynomial in the input size. ¹

For more on greedy algorithms, NP-optimization problems, and other advanced topics in algorithms, take CSE 347 and CSE 541!

Finally, suppose we modify the knapsack problem so that you are allowed to take take only part of an item. In particular, it is permitted to add any weight $w \leq w_i$ of item $i$ to the knapsack, which contributes a proportional amount of value $w \cdot d_i$. A greedy choice for this fractional knapsack problem is the following: take as much of the highest-density item as will fit in the knapsack.

Question D4: prove that an algorithm that makes this greedy choice always yields a knapsack of optimal value.

We saw a similar problem last week, the Hamiltonian path problem, which was stated to be NP-complete. What's the difference between NP-optimization and NP-complete? The latter only applies to decision problems (where the answer is either yes or no), such as the existence of a certain path, while the former applies to optimization problems, where the goal is to minimize or maximize some quantity (e.g. the knapsack value). ↩