String Map K Mer Assignment
Contents
Group Assignment
This is a group assignment.
Code To Investigate
Java Util Concurrent
StringKMers
The following method should be useful as you build the assignment.
String toString(byte[] sequence, int offset, int kMerLength)
/** * Stores the information from the given sequence into a String. For example, if * you had the sequence, "ACCTGTCAAAA" and you called this method with an offset * of 1 and a k of 4, it would return "CCTG". * * @param sequence the sequence of nucleobases to draw the bytes from * @param offset the offset for where to start looking for bytes * @param k the length of the k-mer to make a String for * @return a String representation of the k-mer at the desired position */ public static String toString(byte[] sequence, int offset, int k) { return new String(sequence, offset, k, StandardCharsets.UTF_8); }
KMerResults
When we have completed our k-mer counting, we need to return the results. Since we will have multiple k-mer counters with several different data structures, we are provided with a common interface KMerResults.
public interface KMerResults { Iterable<byte[]> foundKMers(); int getCount(byte[] kMer); }
StringMapKMerResults
For this particular assignment we will be using instances of Map<String, Integer>, so the provided class StringMapKMerResults will serve well:
public final class StringMapKMerResults extends AbstractMapKMerResults<String> { public StringMapKMerResults(int k, Map<String, Integer> map) { super(k, map, StringKMerCodec.INSTANCE); } }
Code To Implement
StringHashMapKMerCounter
class: | StringHashMapKMerCounter.java | |
methods: | parse | |
package: | kmer.group | |
source folder: | student/src/main/java |
parse
method: public StringMapKMerResults parse(List<byte[]> sequences, int k)
(sequential implementation only)
In this completely sequential implementation, you will have to write the parse method. The method takes in a list of arrays of bytes and a k-mer length. It should return an instance of StringMapKMerResults(which takes in a map), a class provided to you which does exactly what its name suggests.
parse should go through the amount of possible k-mers for every byte array in the list of sequences. As it goes through the bytes in the array, use the StringKMers.toString(sequence, offset, k) method to create a string to use as a key for the HashMap. The map should take in a String as the key and an Integer as the value. We recommend using the map.compute() method and reviewing how to use lambdas.
StringConcurrentHashMapKMerCounter
class: | StringConcurrentHashMapKMerCounter.java | |
methods: | constructor concurrentMapFactory createConcurrentMap parse |
|
package: | kmer.group | |
source folder: | student/src/main/java |
Note: this class forces a somewhat ridiculous amount of abstraction when it comes to constructing new ConcurrentMaps. The constructor will be passed a Supplier to be used instead of, for example, constructing a ConcurrentHashMap directly. This allows the testing to catch errors sooner in an effort to aid debugging.
constructor and instance variable
Hang onto the concurrentMapFactory passed to the constructor in an instance variable so you can use it later.
public StringConcurrentMapUnbalancedKMerCounter(Supplier<ConcurrentMap<String, Integer>> concurrentMapFactory)
concurrentMapFactory
Return the concurrentMapFactory passed to the constructor.
createConcurrentMap
Use the concurrentMapFactory's get() method to create a new ConcurrentMap.
parse
method: public StringMapKMerResults parse(List<byte[]> sequences, int k)
(parallel implementation required)
This implementation will make your sequential String HashMap implementation into a parallel one. To do so, you will be making use of Java’s thread-safe version of a Map: ConcurrentHashMap. Like before, you will be need to complete the parse method but this time in parallel.
Recall that the ConcurrentHashMap will prevent data races. We need to use a method like compute instead of a get/modify/put pattern to prevent atomicity races.
Get then put is not atomic
Call compute. Call compute.
Use ConcurrentHashMap! Use ConcurrentHashMap!
Or say shoot. Or say shoot.
Videos
Testing Your Solution
class: | __StringKMerTestSuite.java | |
package: | kmer.group | |
source folder: | testing/src/test/java |
sequential
class: | _StringSequentialKMerTestSuite.java | |
package: | kmer.group | |
source folder: | testing/src/test/java |
concurrent
class: | _StringConcurrentKMerTestSuite.java | |
package: | kmer.group | |
source folder: | testing/src/test/java |