String Map K Mer Assignment
Contents
Group Assignment
This is a group assignment.
Code To Investigate
java.util.concurrent
StringKMers
The following method should be useful as you build the assignment.
String toString(byte[] sequence, int offset, int kMerLength)
/**
* Stores the information from the given sequence into a String. For example, if
* you had the sequence, "ACCTGTCAAAA" and you called this method with an offset
* of 1 and a k of 4, it would return "CCTG".
*
* @param sequence the sequence of nucleobases to draw the bytes from
* @param offset the offset for where to start looking for bytes
* @param k the length of the k-mer to make a String for
* @return a String representation of the k-mer at the desired position
*/
public static String toString(byte[] sequence, int offset, int k) {
return new String(sequence, offset, k, StandardCharsets.UTF_8);
}
KMerResults
When we have completed our k-mer counting, we need to return the results. Since we will have multiple k-mer counters with several different data structures, we are provided with a common interface KMerResults.
public interface KMerResults {
Iterable<byte[]> foundKMers();
int getCount(byte[] kMer);
}
StringMapKMerResults
For this particular assignment we will be using instances of Map<String, Integer>, so the provided class StringMapKMerResults will serve well:
public final class StringMapKMerResults extends AbstractMapKMerResults<String> {
public StringMapKMerResults(int k, Map<String, Integer> map) {
super(k, map, StringKMerCodec.INSTANCE);
}
}
Code To Implement
StringHashMapKMerCounter
class: | StringHashMapKMerCounter.java | |
methods: | parse | |
package: | kmer.group | |
source folder: | student/src/main/java |
parse
method: public StringMapKMerResults parse(List<byte[]> sequences, int k)
(sequential implementation only)
In this completely sequential implementation, you will have to write the parse method. The method takes in a list of arrays of bytes and a k-mer length. It should return an instance of StringMapKMerResults(which takes in a map), a class provided to you which does exactly what its name suggests.
parse should go through the amount of possible k-mers for every byte array in the list of sequences. As it goes through the bytes in the array, use the StringKMers.toString(sequence, offset, k) method to create a string to use as a key for the HashMap. The map should take in a String as the key and an Integer as the value. We recommend using the map.compute() method and reviewing how to use lambdas.
StringConcurrentHashMapKMerCounter
class: | StringConcurrentHashMapKMerCounter.java | |
methods: | parse | |
package: | kmer.group | |
source folder: | student/src/main/java |
parse
method: public StringMapKMerResults parse(List<byte[]> sequences, int k)
(parallel implementation required)
This implementation will make your sequential String HashMap implementation into a parallel one. To do so, you will be making use of Java’s thread-safe version of a Map: ConcurrentHashMap. Like before, you will be need to complete the parse method but this time in parallel.
Recall that the ConcurrentHashMap will NOT prevent data races. We need to use a method like compute instead of a get/modify/put pattern to prevent atomicity races.
- Get then put is not atomic.
- Call compute. Call compute.
- Use ConcurrentHashMap. Use ConcurrentHashMap.
- Or say shoot. Or say shoot.
Videos
Video: ConcurrentHashMap compute |
---|
Video: StringHashMapKMerCounter |
---|
Video: StringConcurrentHashMapKMerCounter |
---|
Testing Your Solution
class: | __StringKMerTestSuite.java | |
package: | kmer.group | |
source folder: | testing/src/test/java |