Not Thread Safe Hash Table Assignment

From CSE231 Wiki
Jump to navigation Jump to search

Motivation

We gain experience implementing an abstract data type. Specifically, we will build a @NotThreadSafe class which implements the java.util.Map<K,V> interface. Later in the semester, we will implement a @ThreadSafe version: ConcurrentHashTable.

We gain experience with generics since the Map interface is parameterized with multiple generic types: K for the key type and V for the value type.

We will build some utilities to support hashing and key-value pairs which we will use throughout the semester.

Finally, the selection of methods we will implement includes two which you may not be familiar with: compute() and computeIfAbsent(). Each of these methods have special roles to play throughout the semester.

Background

Hash Tables

Video: How To Use java.util.Map  
Video: CS50 Shorts: Hash Tables  
Video: HackerRank Hash Tables  

Java Utilities To Use

Map.Entry<K,V>

interface Map.Entry<K,V>

entry.getKey()
entry.getValue()
entry.setValue(value)

Map<K,V>

interface Map<K, V>

map.get(key)
map.put(key, value)
map.computeIfAbsent(key, mappingFunction)
map.compute(key, remappingFunction)
map.entrySet()

ToIntFunction<T>

interface ToIntFunction<T>

function.applyAsInt(x)

Code To Investigate

Each of the example clients operate on the words of the Gettysburg Address.

PutClient

class: PutClient.java CLIENT
package: hashtable.notthreadsafe.client
source folder: student/src/main/java

This client uses repeated calls to map.put(key, value).

Each word is processed by taking its first letter (converted to lower case) and associating that letter with the word. Since put replaces a previous association, if it exists, the last word starting with a particular letter wins.

PutClient  
int chainCount = 8;
Map<Character, String> mapFirstCharacterToLastEncounteredWord = new NotThreadSafeHashTable<>(chainCount, Objects::hashCode);

for (TextSection textSection : TextSectionResource.GETTYSBURG_ADDRESS.textSections()) {
    for (String word : textSection.words()) {
        if (word.length() > 0) {
            char firstCharacterLowerCased = Character.toLowerCase(word.charAt(0));
            mapFirstCharacterToLastEncounteredWord.put(firstCharacterLowerCased, word);
        }
    }
}

for (char ch = 'a'; ch <= 'z'; ++ch) {
    String value = mapFirstCharacterToLastEncounteredWord.get(ch);
    System.out.println(ch + ": " + toString(value));
}
PutClient Output  
a: and
b: by
c: cause
d: died
e: earth
f: from
g: government
h: have
i: in
j: 
k: 
l: last
m: measure
n: not
o: of
p: perish
q: 
r: resolve
s: shall
t: the
u: under
v: vain
w: we
x: 
y: years
z: 

ComputeIfAbsentClient

class: ComputeIfAbsentClient.java CLIENT
package: hashtable.notthreadsafe.client
source folder: student/src/main/java

This client uses repeated calls to map.computeIfAbsent(key, mappingFunction).

Each word is processed by taking its first letter (converted to lower case) and associating that letter with a list. The first time a letter is encountered the mappingFunction will be invoked which returns the new list to associate with the letter. Conveniently, whether it has been associated before or the current iteration, the List associated with the letter is returned from computeIfAbsent. The current word is then added to that list. Since the returned list is a reference to mutable data there is no need to re-associate the list with the letter. When the loops are completed, each letter holds a list of all the encountered words (including any duplicates).

ComputeIfAbsentClient  
int chainCount = 8;
Map<Character, List<String>> mapFirstCharacterToListOfWords = new NotThreadSafeHashTable<>(chainCount, Objects::hashCode);

for (TextSection textSection : TextSectionResource.GETTYSBURG_ADDRESS.textSections()) {
    for (String word : textSection.words()) {
        if (word.length() > 0) {
            char firstCharacterLowerCased = Character.toLowerCase(word.charAt(0));
            List<String> list = mapFirstCharacterToListOfWords.computeIfAbsent(firstCharacterLowerCased,
                    (unusedKey) -> {
                        return new LinkedList<>();
                    });
            list.add(word);
        }
    }
}

for (char ch = 'a'; ch <= 'z'; ++ch) {
    List<String> value = mapFirstCharacterToListOfWords.get(ch);
    System.out.println(ch + ": " + toString(value));
}
ComputeIfAbsentClient Output  
a: [and, ago, a, and, all, are, are, a, any, and, are, a, a, as, a, altogether, and, a, and, above, add, advanced, a, and]
b: [brought, battle, But, brave, but, be, be, before, birth, by]
c: [continent, conceived, created, civil, conceived, can, come, can, can, consecrate, can, consecrated, can, cause]
d: [dedicated, dedicated, dedicate, do, dedicate, dead, detract, did, dedicated, dedicated, dead, devotion, devotion, dead, died]
e: [equal, engaged, endure, earth]
f: [Four, fathers, forth, field, field, final, for, fitting, far, forget, for, fought, far, for, from, for, full, freedom, for, from]
g: [great, great, gave, ground, great, gave, God, government]
h: [have, here, hallow, here, have, here, here, here, here, have, here, honored, here, highly, have, have]
i: [in, in, It, is, in, it, it, It, is, It, is, increased, in]
j: 
k: 
l: [Liberty, long, lives, live, larger, living, little, long, living, last]
m: [men, met, might, men, measure]
n: [new, nation, Now, nation, nation, nation, not, not, not, note, nor, never, nobly, not, nation, new, not]
o: [our, on, or, on, of, of, our, or, of, of, of]
p: [proposition, portion, place, proper, poor, power, people, people, people, perish]
q: 
r: [resting, remember, rather, rather, remaining, resolve]
s: [score, seven, so, so, should, sense, struggled, say, so, shall, shall, shall]
t: [this, to, the, that, testing, that, that, to, that, those, their, that, that, that, this, this, The, to, The, they, the, to, to, the, they, thus, to, to, the, task, that, these, take, to, that, they, the, that, that, these, that, this, that, the, the, the, the]
u: [us, unfinished, us, us, under]
v: [vain]
w: [we, war, whether, We, war, We, who, we, we, we, we, who, world, will, what, we, what, work, which, who, we, which, we]
x: 
y: [years]
z: 

ComputeClient

class: ComputeClient.java CLIENT
package: hashtable.notthreadsafe.client
source folder: student/src/main/java

This client uses repeated calls to map.compute(key, remappingFunction).

Each word is processed by computing the number of times the (lower case version of the) word has been encountered. Each time compute is called the remappingFunction is passed the key (which is (as is often the case) unused) and the previous value associated with the key, or null if has no association. The remapping function returns the value to associate with the key. Since we are counting the occurrences, we add 1 to the previous value, or simply return 1 if that previous value is null.

When the loops are completed, each word holds the number of times it was encountered..

ComputeClient  
int chainCount = 8;
Map<String, Integer> mapWordToCount = new NotThreadSafeHashTable<>(chainCount, Objects::hashCode);

for (TextSection textSection : TextSectionResource.GETTYSBURG_ADDRESS.textSections()) {
	for(String word : textSection.words()) {
		if (word.length() > 0) {
			String lowerCaseWord = word.toLowerCase();
			mapWordToCount.compute(lowerCaseWord, (unusedKey, prevValue) -> {
				if (prevValue != null) {
					return prevValue + 1;
				} else {
					return 1;
				}
			});
		}
	}
}

System.out.println(toString(mapWordToCount));
ComputeClient Output  
that => 13
the => 11
we => 10
here => 8
to => 8
a => 7
and => 6
can => 5
have => 5
for => 5
it => 5
not => 5
nation => 5
of => 5
in => 4
this => 4
dedicated => 4
they => 3
is => 3
are => 3
so => 3
dead => 3
shall => 3
great => 3
who => 3
us => 3
people => 3
new => 2
conceived => 2
war => 2
rather => 2
gave => 2
but => 2
living => 2
field => 2
from => 2
or => 2
be => 2
these => 2
our => 2
long => 2
what => 2
dedicate => 2
which => 2
men => 2
devotion => 2
on => 2
far => 2
created => 1
testing => 1
battle => 1
proper => 1
brave => 1
vain => 1
under => 1
ago => 1
all => 1
hallow => 1
add => 1
nor => 1
work => 1
honored => 1
cause => 1
government => 1
perish => 1
score => 1
as => 1
their => 1
poor => 1
world => 1
will => 1
note => 1
thus => 1
advanced => 1
increased => 1
earth => 1
engaged => 1
civil => 1
should => 1
do => 1
detract => 1
say => 1
unfinished => 1
continent => 1
equal => 1
any => 1
met => 1
come => 1
live => 1
never => 1
resolve => 1
died => 1
god => 1
seven => 1
brought => 1
forth => 1
whether => 1
endure => 1
those => 1
altogether => 1
consecrated => 1
above => 1
power => 1
remember => 1
forget => 1
task => 1
four => 1
years => 1
proposition => 1
now => 1
final => 1
resting => 1
sense => 1
little => 1
nobly => 1
remaining => 1
last => 1
measure => 1
freedom => 1
fathers => 1
liberty => 1
portion => 1
place => 1
lives => 1
might => 1
fitting => 1
larger => 1
consecrate => 1
ground => 1
struggled => 1
did => 1
fought => 1
before => 1
take => 1
full => 1
highly => 1
birth => 1
by => 1 

Code to Implement

Circle-information.svg Tip: It is good practice to declare variables, parameters, and fields to be of the least specific type possible (but no less specific :). This is commonly an interface.
Attention niels epting.svg Warning: Do NOT use == when comparing Objects (except when comparing to null). Use Objects.equals(a,b).
Attention niels epting.svg Warning: Do NOT use % which can return a negative number. Use Math.floorMod(x,y).
Attention niels epting.svg Warning: Do NOT parallelize.
Attention niels epting.svg Warning: Do NOT simply remove and add for put on a Map. Look for the entry and set its value if you find it. Otherwise add it.

DefaultHashFunction

class: DefaultHashFunction.java Java.png
methods: applyAsInt
package: hash.exercise
source folder: student/src/main/java

Hashing is a powerful tool. We will use it to build our NotThreadSafeHashTable, as well as the MatrixMapReduceFramework later in the semester. Hashing relies on evenly distributing keys to provide performance improvements. As such, it will be useful to specify a particular hash function for our Hash Table (and MatrixMapReduceFramework later) as they will want to be different.

Here we create a default hash function which implements ToIntFunction<T> which can serve as the hash function for our NotThreadSafeHashTable or wherever we need one.

We will just use the standard mechanism for acquiring a hashCode for an Object. Investigate SmearHashFunction which smears the hash code to provide a different result.

applyAsInt(key)

method: public int applyAsInt(K key) Sequential.svg (sequential implementation only)

Return the result of calling Objects.hashCode(key) which nicely handles null.

HashUtils

class: HashUtils.java Java.png
methods: hashCodeOf
arrayIndexForKey
itemInArrayForKey
package: hash.exercise
source folder: student/src/main/java

hashCodeOf(key, hashFunction)

method: public static <K> int hashCodeOf(K key, ToIntFunction<K> hashFunction) Sequential.svg (sequential implementation only)

Invoke the hashFunction's applyAsInt(value) method and return the result.

arrayIndexForKey(key, arrayLength, hashFunction)

method: public static <K> int arrayIndexForKey(K key, int arrayLength, ToIntFunction<K> hashFunction) Sequential.svg (sequential implementation only)

Using hashCodeOf with the provided key will return an int value between (inclusive) Integer.MIN_VALUE and Integer.MAX_VALUE, that is: between (inclusive) -2147483648 and 2147483647.

To convert this to a suitable array index, we would like to take the positive remainder of the value and the arrayLength.

Simply using the % operator will not do, as a negative hashCode would result in a negative array index (which would, of course, produce an array IndexOutOfBoundsException).

Using Math.floorMod(dividend,divisor) solves this problem by always returning a remainder with the sign of the divisor.

itemInArrayForKey(key, array, hashFunction)

method: public static <K, E> E itemInArrayForKey(K key, E[] array, ToIntFunction<K> hashFunction) Sequential.svg (sequential implementation only)

Leverage arrayIndexForKey to find the index in the array, then return the item at that index.

DefaultEntry<K,V>

Attention niels epting.svg Alert: This is in student-pre-aspect!!!

Whatever language you happen to be programming in there will be a way to associate keys with values. Whether the data structure is called a Dictionary in Python, a Hash in Ruby, an Association List in Lisp, or a Map<K,V> in Java, there will be a way to store the (key, value) pairs.

Java provides the interface Entry<K, V> which is used on all of its Maps. We will implement this interface in class DefaultEntry<K,V>.

class: DefaultEntry.java Java.png
methods: constructor
getKey
getValue
setValue
package: entry.exercise
source folder: student-pre-aspect/src/main/java

constructor DefaultEntry(key, value)

method: public DefaultEntry(K key, V value) Sequential.svg (sequential implementation only)

Hang onto the key and value specified as parameters in instance variables to be returned (and perhaps mutated in the case of value) later.

getKey()

method: public K getKey() Sequential.svg (sequential implementation only)

entry.getKey() specification

getValue()

method: public V getValue() Sequential.svg (sequential implementation only)

entry.getValue() specification

setValue()

method: public V setValue(V value) Sequential.svg (sequential implementation only)

entry.setValue(value) specification

NotThreadSafeHashTable

NotThreadSafeHashTable will use an entry’s key to place an entry into one of several buckets. Each bucket will hold a chain of entries. We maintain a chain of entries for when collisions naturally occur when multiple keys want to be placed in the same bucket.

HashTables are a popular data structure because we can cut down the runtime of searching for entries within the map by adding more buckets/chains. The time required to search for a particular entry drops to roughly 1/total_number_of_buckets (in comparison to one long chain which contains every entry).

hash table on wikipedia

In this class, you will use the entry’s key’s hashcode to determine which of the buckets it should fall into. In order to calculate the index of the bucket that the entry belongs in, we will make use of the hash function, which in turn uses Math.floorMod to calculate a modulo (remainder or % in Java) based on the number of buckets and the key’s hashcode.


Since your class will implement Map<K,V>. We will leave many of the methods as throw new NotRequiredForTheAssignmentException(); while we focus on some critical methods of import. Specifications for these methods can be found in the Map<K,V> javadocs

Attention niels epting.svg Warning: Note the specifications of the return values.

Further, the standard interface for a key value pair is Map's Entry with the critical methods:

  • K getKey()
  • V getValue()
  • V setValue(V value)


class: NotThreadSafeHashTable.java Java.png
methods: constructor
hashFunction
chains
entryOf
put
get
computeIfAbsent
compute
package: NotThreadSafeHashTable.notthreadsafe.exercise
source folder: student/src/main/java

constructor: NotThreadSafeHashTable(chainCount, hashFunction)

method: public NotThreadSafeHashTable(int chainCount, ToIntFunction<K> hashFunction) Sequential.svg (sequential implementation only)

Initialize chainCount chains in the form of an array of Collections of entries. Additionally, hang onto the hashFunction for later use.

Requirement:
Green check.svg In order to facilitate testing and debugging, create instances of either java.util.ArrayList or java.util.LinkedList to fill your array with.
Attention niels epting.svg Warning: Initializing the contents of an array is one of the few cases where the indexing for loop is superior to the for-each loop.

Use:

for (int i = 0; i < buckets.length; ++i)

in this case.

Implementing the constructor has a couple of tricky parts.

Parameterized Type Array Tip

First, you will encounter a much dreaded problem in Java. One cannot instantiate a new generic array. While there is a reason for this limitation, it is still frustrating.

So while you might think you should enter:

buckets = new Collection<Entry<K, V>>[capacity]; // NOTE: this will not compile

the error reported says it all "Cannot create a generic array of Collection<Map.Entry<K,V>>".

You must instead create the array without the generic type parameter:

buckets = new Collection[capacity];

This results in a warning which the @SuppressWarnings("unchecked") annotation we added to the constructor takes care of.

hashFunction()

method: public ToIntFunction<K> hashFunction() Sequential.svg (sequential implementation only)

Return the hashFunction provided to the constructor.

chains()

method: private Collection<Entry<K, V>>[] chains() Sequential.svg (sequential implementation only)

Return the chains created in the constructor. This method is required for the provided entrySet() method and provides the additional benefit of allowing the creation of the chains to be tested before moving on to the public methods (get, put, computeIfAbsent, compute) which have little hope of working correctly if the chains are not correct.


entryOf(chain, key)

method: private static <K, V> Entry<K, V> entryOf(Collection<Entry<K, V>> chain, K key) Sequential.svg (sequential implementation only)

Many of the methods listed above exist solely for the purpose of testing to assist in debugging. It can be argued that the abstraction is past any point of reasonable abstraction. Not so for entryOf(chain, key). It is perfectly reasonable and useful to build this method as it will be well used by get, put, computeIfAbsent, compute, and many other methods on Map.

Requirement:
Green check.svg Use the sequential for each loop to iterate through the chain. This is both good style and necessary for the testing.

put(key, value)

method: public V put(K key, V value) Sequential.svg (sequential implementation only)

Note the required return value in the put(key,value) specification.

Requirements:
Green check.svg Only the lone correct chain of entries should be searched.
Green check.svg The correct chain should be searched only once.
Green check.svg When you find an already existing Entry for a key, do not create a new Entry. Simply set the value of the existing Entry.
Green check.svg Do not call any of the methods on interface Map (for example get(key)). Your utility functions should be all that you need.
Attention niels epting.svg Alert: Use instances of DefaultEntry to store your key value pairs.

get(key)

method: public V get(Object key) Sequential.svg (sequential implementation only)

get(key) spec

Requirements:
Green check.svg Only the lone correct chain of entries should be searched.
Green check.svg The correct chain should be searched only once.

computeIfAbsent(key, mappingFunction)

While the computeIfAbsent(key, mappingFunction) specification uses invocations of get and put, you should do neither. You must achieve the computeIfAbsent specification, but do so akin to the approach taken by put: finding the chain for a key, finding the entry for the key (if it exists) in that chain, and so on.

!!! NOTE: you do NOT need to handle any cases involving remove.

Requirements:
Green check.svg Only the lone correct chain of entries should be searched.
Green check.svg The correct chain should be searched only once.
Green check.svg When you find an already existing Entry for a key, do not create a new Entry. Simply set the value of the existing Entry.
Green check.svg Do not call any of the methods on interface Map (for example get(key)). Your utility functions should be all that you need.
Attention niels epting.svg Alert: Use instances of DefaultEntry to store your key value pairs.

compute(key, remappingFunction)

While the compute(key, remappingFunction) specification uses invocations of get, put, and remove, you should do none of those things. You must achieve the compute specification, but do so akin to the approach taken by put: finding the chain for a key, finding the entry for the key (if it exists) in that chain, and so on.

!!! NOTE: you do NOT need to handle any cases involving remove.

Requirements:
Green check.svg Only the lone correct chain of entries should be searched.
Green check.svg The correct chain should be searched only once.
Green check.svg When you find an already existing Entry for a key, do not create a new Entry. Simply set the value of the existing Entry.
Green check.svg Do not call any of the methods on interface Map (for example get(key)). Your utility functions should be all that you need.
Attention niels epting.svg Alert: Use instances of DefaultEntry to store your key value pairs.


Testing Your Solution

class: __NotThreadSafeHashTableTestSuite.java Junit.png
package: hashtable.notthreadsafe.exercise
source folder: testing/src/test/java

Pledge, Acknowledgments, Citations

file: exercise-hash-table-pledge-acknowledgments-citations.txt

More info about the Honor Pledge