String Map K Mer Assignment

From CSE231 Wiki
Revision as of 00:47, 20 April 2023 by Cosgroved (talk | contribs)
Jump to navigation Jump to search

Group Assignment

This is a group assignment.

Code To Investigate

java.util.concurrent

ConcurrentHashMap

StringKMers

The following method should be useful as you build the assignment.

String toString(byte[] sequence, int offset, int kMerLength)

/**
 * Stores the information from the given sequence into a String. For example, if
 * you had the sequence, "ACCTGTCAAAA" and you called this method with an offset
 * of 1 and a k of 4, it would return "CCTG".
 * 
 * @param sequence the sequence of nucleobases to draw the bytes from
 * @param offset   the offset for where to start looking for bytes
 * @param k        the length of the k-mer to make a String for
 * @return a String representation of the k-mer at the desired position
 */
public static String toString(byte[] sequence, int offset, int k) {
	return new String(sequence, offset, k, StandardCharsets.UTF_8);
}

KMerResults

When we have completed our k-mer counting, we need to return the results. Since we will have multiple k-mer counters with several different data structures, we are provided with a common interface KMerResults.

public interface KMerResults {
	Iterable<byte[]> foundKMers();

	int getCount(byte[] kMer);
}

StringMapKMerResults

For this particular assignment we will be using instances of Map<String, Integer>, so the provided class StringMapKMerResults will serve well:

public final class StringMapKMerResults extends AbstractMapKMerResults<String> {
	public StringMapKMerResults(int k, Map<String, Integer> map) {
		super(k, map, StringKMerCodec.INSTANCE);
	}
}

Code To Implement

StringHashMapKMerCounter

class: StringHashMapKMerCounter.java Java.png
methods: parse
package: kmer.group
source folder: student/src/main/java

parse

method: public StringMapKMerResults parse(List<byte[]> sequences, int k) Sequential.svg (sequential implementation only)

In this completely sequential implementation, you will have to write the parse method. The method takes in a list of arrays of bytes and a k-mer length. It should return an instance of StringMapKMerResults(which takes in a map), a class provided to you which does exactly what its name suggests.

parse should go through the amount of possible k-mers for every byte array in the list of sequences. As it goes through the bytes in the array, use the StringKMers.toString(sequence, offset, k) method to create a string to use as a key for the HashMap. The map should take in a String as the key and an Integer as the value. We recommend using the map.compute() method and reviewing how to use lambdas.

StringConcurrentHashMapKMerCounter

class: StringConcurrentHashMapKMerCounter.java Java.png
methods: parse
package: kmer.group
source folder: student/src/main/java

parse

method: public StringMapKMerResults parse(List<byte[]> sequences, int k) Parallel.svg (parallel implementation required)

This implementation will make your sequential String HashMap implementation into a parallel one. To do so, you will be making use of Java’s thread-safe version of a Map: ConcurrentHashMap. Like before, you will be need to complete the parse method but this time in parallel.

Recall that the ConcurrentHashMap will NOT prevent data races. We need to use a method like compute instead of a get/modify/put pattern to prevent atomicity races.

Get then put is not atomic.
Call compute. Call compute.
Use ConcurrentHashMap. Use ConcurrentHashMap.
Or say shoot. Or say shoot.

Videos


Testing Your Solution

class: __StringKMerTestSuite.java Junit.png
package: kmer.group
source folder: testing/src/test/java

sequential

class: _StringSequentialKMerTestSuite.java Junit.png
package: kmer.group
source folder: testing/src/test/java

concurrent

class: _StringConcurrentKMerTestSuite.java Junit.png
package: kmer.group
source folder: testing/src/test/java