K Mer Balance Assignment
Contents
Code To Implement
DefaultByteArrayRange
class: | DefaultByteArrayRange.java | |
methods: | constructor originalCompleteSequence min maxExclusive |
|
package: | kmer.balance.exercise | |
source folder: | student/src/main/java |
constructor and instance variables
originalCompleteSequence
min
maxExclusive
DefaultByteArrayRangeCreator
class: | DefaultByteArrayRangeCreator.java | |
methods: | apply | |
package: | kmer.balance.exercise | |
source folder: | student/src/main/java |
apply(originalCompleteSequence, min, maxExclusive)
method: public ByteArrayRange apply(byte[] originalCompleteSequence, int min, int maxExclusive))
(sequential implementation only)
DefaultKMerBalancer
DefaultKMerBalancer exists to balance the workload of a list of dna sub-sequences. We have balanced workload before by breaking up data into as equally sized slices in the Ranges assignment. We used the sliced up Ranges in everything from the Coarsening Nucleobase Count to Matrix MapReduce.
K-mer counting presents an additional challenge of having to deal with sub-sequences of radically different lengths.
class: | DefaultKMerBalancer.java | |
methods: | constructor byteArrayRangeCreator sliceKernel createSlices calculateReasonableThreshold |
|
package: | kmer.balance.exercise | |
source folder: | student/src/main/java |
Tip: Get DefaultKMerBalancer working and then use it to balance all of your parallel KMerCounter implementations. |
constructor
In a bit of overdone abstraction, a byteArrayRangeCreator is passed to the constructor to be used instead of creating DefaultByteArrayRanges directly. This is done to allow the testing to help catch bugs earlier. Hang onto this byteArrayRangeCreator in an instance variable to use later.
byteArrayRangeCreator
Return the byteArrayRangeCreator passed to the constructor.
createSlices
method: public static List<Slice<byte[]>> createSlices(List<byte[]> sequences, int k, IntPredicate slicePredicate)
(sequential implementation only)
Warning: be sure to slice the offset (a.k.a. startingIndex) space |
for N=10 and K=3, you should be slicing like this:
sliceKernel
method: private static void sliceKernel(List<ByteArrayRange> slices, byte[] sequence, int min, int max,
IntPredicate slicePredicate)
(sequential implementation only)
Here we will employ a divide-and-conquer like strategy to fill the slices list with ByteArrayRanges that are all not too long. We leverage the provided slicePredicate to test whether or not a range length from min to max is long enough to warrant dividing further.
threshold calculation
method: public static int calculateReasonableThreshold(List<byte[]> sequences, int k)
(sequential implementation only)
Define the threashold of slice length used in slicePredicate.
Question to ask yourself: given a list of sequences which are to be k-mer counted, how would you calculate a reasonable threshold?
Testing Your Solution
Correctness
class: | __BalanceTestSuite.java | |
package: | kmer.balance.exercise | |
source folder: | testing/src/test/java |