K Mer Balance Assignment

From CSE231 Wiki
Jump to navigation Jump to search

Motivation

Code To Implement

DefaultByteArrayRange

class: DefaultByteArrayRange.java Java.png
methods: constructor
originalCompleteSequence
min
maxExclusive
package: kmer.balance.exercise
source folder: student/src/main/java

constructor and instance variables

constructor: public DefaultByteArrayRange(byte[] originalCompleteSequence, int min, int maxExclusive)

originalCompleteSequence

method: public byte[] originalCompleteSequence() Sequential.svg (sequential implementation only)

Return the originalCompleteSequence passed to the constructor.

min

method: public int min() Sequential.svg (sequential implementation only)

Return the min passed to the constructor.

maxExclusive

method: int maxExclusive() Sequential.svg (sequential implementation only)

Return the maxExclusive passed to the constructor.

DefaultKMerBalancer

DefaultKMerBalancer exists to balance the workload of a list of dna sub-sequences. We have balanced workload before by breaking up data into as equally sized slices in the Ranges assignment. We used the sliced up Ranges in everything from the Coarsening Nucleobase Count to Matrix MapReduce.

K-mer counting presents an additional challenge of having to deal with sub-sequences of radically different lengths.

class: DefaultKMerBalancer.java Java.png
methods: sliceKernel
createSlices
calculateReasonableThreshold
package: kmer.balance.exercise
source folder: student/src/main/java
Circle-information.svg Tip: Get DefaultKMerBalancer working and then use it to balance all of your parallel KMerCounter implementations.

createSlices

method: public static List<Slice<byte[]>> createSlices(List<byte[]> sequences, int k, IntPredicate slicePredicate) Sequential.svg (sequential implementation only)

Attention niels epting.svg Warning: be sure to slice the offset (a.k.a. startingIndex) space

for N=10 and K=3, you should be slicing like this:

ThresholdSlices.svg

Video: ThresholdSlices  

sliceKernel

method: private static void sliceKernel(List<ByteArrayRange> slices, byte[] sequence, int min, int max, IntPredicate slicePredicate) Sequential.svg (sequential implementation only)

Here we will employ a divide-and-conquer like strategy to fill the slices list with ByteArrayRanges that are all not too long. We leverage the provided slicePredicate to test whether or not a range length from min to max is long enough to warrant dividing further.

threshold calculation

method: public static int calculateReasonableThreshold(List<byte[]> sequences, int k) Sequential.svg (sequential implementation only)

Define the threashold of slice length used in slicePredicate.

Question to ask yourself: given a list of sequences which are to be k-mer counted, how would you calculate a reasonable threshold?

Testing Your Solution

class: __BalanceTestSuite.java Junit.png
package: kmer.balance.exercise
source folder: testing/src/test/java

Pledge, Acknowledgments, Citations

file: kmer-balance-pledge-acknowledgments-citations.txt

More info about the Honor Pledge