K Mer Balance Assignment
Contents
Motivation
Code To Implement
DefaultByteArrayRange
class: | DefaultByteArrayRange.java | |
methods: | constructor originalCompleteSequence min maxExclusive |
|
package: | kmer.balance.exercise | |
source folder: | student/src/main/java |
constructor and instance variables
constructor: public DefaultByteArrayRange(byte[] originalCompleteSequence, int min, int maxExclusive)
originalCompleteSequence
method: public byte[] originalCompleteSequence()
(sequential implementation only)
Return the originalCompleteSequence passed to the constructor.
min
method: public int min()
(sequential implementation only)
Return the min passed to the constructor.
maxExclusive
method: int maxExclusive()
(sequential implementation only)
Return the maxExclusive passed to the constructor.
DefaultKMerBalancer
DefaultKMerBalancer exists to balance the workload of a list of dna sub-sequences. We have balanced workload before by breaking up data into as equally sized slices in the Ranges assignment. We used the sliced up Ranges in everything from the Coarsening Nucleobase Count to Matrix MapReduce.
K-mer counting presents an additional challenge of having to deal with sub-sequences of radically different lengths.
class: | DefaultKMerBalancer.java | |
methods: | constructor byteArrayRangeCreator sliceKernel createSlices calculateReasonableThreshold |
|
package: | kmer.balance.exercise | |
source folder: | student/src/main/java |
Tip: Get DefaultKMerBalancer working and then use it to balance all of your parallel KMerCounter implementations. |
constructor
In a bit of overdone abstraction, a byteArrayRangeCreator is passed to the constructor to be used instead of creating DefaultByteArrayRanges directly. This is done to allow the testing to help catch bugs earlier. Hang onto this byteArrayRangeCreator in an instance variable to use later.
byteArrayRangeCreator
Return the byteArrayRangeCreator passed to the constructor.
createSlices
method: public static List<Slice<byte[]>> createSlices(List<byte[]> sequences, int k, IntPredicate slicePredicate)
(sequential implementation only)
Warning: be sure to slice the offset (a.k.a. startingIndex) space |
for N=10 and K=3, you should be slicing like this:
Video: ThresholdSlices |
---|
sliceKernel
method: private static void sliceKernel(List<ByteArrayRange> slices, byte[] sequence, int min, int max,
IntPredicate slicePredicate)
(sequential implementation only)
Here we will employ a divide-and-conquer like strategy to fill the slices list with ByteArrayRanges that are all not too long. We leverage the provided slicePredicate to test whether or not a range length from min to max is long enough to warrant dividing further.
threshold calculation
method: public static int calculateReasonableThreshold(List<byte[]> sequences, int k)
(sequential implementation only)
Define the threashold of slice length used in slicePredicate.
Question to ask yourself: given a list of sequences which are to be k-mer counted, how would you calculate a reasonable threshold?
Testing Your Solution
class: | __BalanceTestSuite.java | |
package: | kmer.balance.exercise | |
source folder: | testing/src/test/java |
Pledge, Acknowledgments, Citations
file: | kmer-balance-pledge-acknowledgments-citations.txt |
More info about the Honor Pledge