Studio 2: Obtaining better asymptotic performance for a list

Pre-studio: Review of Linked Lists

Open Eclipse and do a Team/Pull on your repo to make sure it is up to date.

This studio focuses on the complexity of operations on linked lists. We have provided you with some linked-list code from CSE 131 in the studio2.lists package.

  • Take a look at this code, paying particular attention to the implementation in LinkedList.java, before studio, and try to understand how it works. In particular, take a few minutes to watch this video from CSE131, which discusses how to add an element to the end of the list (the addLast() operation). You'll be studying and modifying the code for this procedure as part of your studio.

Part A: Probe the Costs of Linked Lists

We'll start by studying the cost of some methods in the LinkedList class.

  • Go to the studio2 package in your repository, open the AddLastNoTail class, and run it.

The work performed by AddLastNoTail is mostly in the run() method of its base class AddLastBase. That method appends $n$ successive random integers to the end of the list.

  • Refresh the outputs folder and find the files of the time and ticks counts generated by running AddLastNoTail. Plot the running times in these files as a function of $n$.

Question A1: How would you describe the curves for ticks and time as functions of $n$?

You may be wondering where the ticker calls that generated tick counts for this program are located. They are in the addLast() method in LinkedList.java in the studio2.lists package, which you looked at during the pre-studio.

addLast() is instrumented to increment the tick counter once for each constant-time statement in the function (other than calls to the ticker itself). This level of detail may not be necessary to get an idea of the function's asymptotic complexity, but check to make sure that you know why each tick() call in the function has the count it does.

Question A2: Based on the code of addLast() and what you've learned from Studio 1 and our class meetings, explain why the ticks and time curves for AddLastNoTail have the shapes you observe.

Part B: Improve the Behavior of Appending to a List

  • Find and open the LinkedListWithTail class in the studio2.lists package.

This is the only class you will modify in this exercise. As given to you, its behavior is identical to AddLastWithTail.

  • Modify this class so that it includes a data member tail that is always a reference to the last element in the list, and so that it uses tail to help add an element to the end of the list. Ask your TA for help if needed.

    • You'll need to modify any method that changes the list to correctly maintain tail so that it always points to the last element of the list.

    • What should tail be after the constructor for the list completes?

    • How can you ensure that tail is set correctly at the end of the addLast() function, assuming it was set correctly at the beginning of the function?

      The property that tail always points to the end of the list is an invariant of the class -- it is a property that must be true true any time you look at the list. Once you know the invariant that you are trying to maintain, you can reason separately about what each function should do to maintain it. We'll see more examples of data structure invariants later in the semester.

  • Before you go on, modify the ticker calls in addLast() so that they again count one tick for every constant-time statement in the function. Your tick calls should match your new code.

  • Open the AddLastWithTail class and run it. This class calls your modified code to add $n$ random elements to a list.

  • Refresh outputs again and produce ticks and time plots for your modified code as usual.

Question B1: what shape of running-time curve do you now see for $n$ elements to a list? Explain why the shape of the curve changed as a result of your changes.

Question B2: how much additional storage per list did your change require?

Question B3: is adding a tail pointer a worthwhile extension to a linked list? When might it not be worthwhile?

Part C: Improve the Behavior of Querying List Length

For this part, continue to modify (only) the LinkedListWithTail class.

Let's consider the cost of determining the size of a list of $n$ elements. The Size class adds $n$ elements to a list, then calls the getSize() function to determine its size. It counts ticks only for the getSize() function, not for adding elements.

  • Look at the implementation of getSize(), and determine how many ticks it should take on a list of size $n$.

Question C1: give an exact expression for the number of ticks used by getSize() as a function of $n$, and justify how you got it.

  • Design a modification of the list class so that a call to getSize() can always run in constant ($\Theta(1)$) time, regardless of $n$.

Question C2: what invariant did you add to your class to make constant-time getSize() possible? What do you have to do to maintain this invariant?

  • Measure your modified list's performance by running the Size and AddListNoTail classes and plotting the resulting running times.

Question C3: what evidence do you have that your getSize() method is really constant-time? Did you change the asymptotic complexity of any other list operations in the process?

Part D: Further Extensions

For this part, there is nothing to code, but you should think about how to implement the suggested extensions of a linked list and how to justify their correctness and their impact on running time.

  • The midpoint of a list of length $n$ is Element $\lceil n/2 \rceil$, where Element 1 is the first element in the list. For example, in a list $[a, b, c]$, the midpoint is $b$; in a list $[u, v, w, x, y, z]$, the midpoint is $w$.

Question D1: what is the asymptotic complexity of returning the midpoint of a list of length $n$, assuming no extensions to the LinkedList class?

  • Design an extension to the linked list data type that lets you access the midpoint of the list in constant time, no matter how many elements have been inserted.

Question D2: what invariant do you need to maintain? How do you extend the methods in the LinkedList class to maintain it? Does their asymptotic running time change?

  • Now consider how to extend the LinkedList class so that you can always return the maximum element in the list in constant time.

Question D3: describe your extension strategy and its impact on the complexity of the list operations.

Question D4: suppose we augment LinkedList so that we can delete an element from the linked list, given a reference to it, in constant time (this would require our list to be doubly-linked). How does the complexity of deletion change if you also want to maintain constant-time access to the maximum element in the list?

It's challenging (indeed, seemingly impossible) to achieve fast insertion, deletion, and maximum operations by extending a linked list! We'll be looking at other data structures later in the course for which all three operations (and more!) can be implemented efficiently.