K Mer Balance Assignment

From CSE231 Wiki
Revision as of 02:01, 11 April 2023 by Cosgroved (talk | contribs) (Created page with "=Code To Implement= ==DefaultByteArrayRange== {{CodeToImplement|DefaultByteArrayRange|constructor<br/>originalCompleteSequence<br/>min<br/>maxExclusive|kmer.balance.exercise}}...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Code To Implement

DefaultByteArrayRange

class: DefaultByteArrayRange.java Java.png
methods: constructor
originalCompleteSequence
min
maxExclusive
package: kmer.balance.exercise
source folder: student/src/main/java

constructor and instance variables

originalCompleteSequence

min

maxExclusive

DefaultByteArrayRangeCreator

class: DefaultByteArrayRangeCreator.java Java.png
methods: apply
package: kmer.balance.exercise
source folder: student/src/main/java

apply(originalCompleteSequence, min, maxExclusive)

method: public ByteArrayRange apply(byte[] originalCompleteSequence, int min, int maxExclusive)) Sequential.svg (sequential implementation only)

DefaultKMerBalancer

DefaultKMerBalancer exists to balance the workload of a list of dna sub-sequences. We have balanced workload before by breaking up data into as equally sized slices in the Ranges assignment. We used the sliced up Ranges in everything from the Coarsening Nucleobase Count to Matrix MapReduce.

K-mer counting presents an additional challenge of having to deal with sub-sequences of radically different lengths.

class: DefaultKMerBalancer.java Java.png
methods: constructor
byteArrayRangeCreator
sliceKernel
createSlices
calculateReasonableThreshold
package: kmer.balance.exercise
source folder: student/src/main/java
Circle-information.svg Tip: Get DefaultKMerBalancer working and then use it to balance all of your parallel KMerCounter implementations.

constructor

In a bit of overdone abstraction, a byteArrayRangeCreator is passed to the constructor to be used instead of creating DefaultByteArrayRanges directly. This is done to allow the testing to help catch bugs earlier. Hang onto this byteArrayRangeCreator in an instance variable to use later.

byteArrayRangeCreator

Return the byteArrayRangeCreator passed to the constructor.

createSlices

method: public static List<Slice<byte[]>> createSlices(List<byte[]> sequences, int k, IntPredicate slicePredicate) Sequential.svg (sequential implementation only)

Attention niels epting.svg Warning: be sure to slice the offset (a.k.a. startingIndex) space

for N=10 and K=3, you should be slicing like this:

ThresholdSlices.svg

sliceKernel

method: private static void sliceKernel(List<ByteArrayRange> slices, byte[] sequence, int min, int max, IntPredicate slicePredicate) Sequential.svg (sequential implementation only)

Here we will employ a divide-and-conquer like strategy to fill the slices list with ByteArrayRanges that are all not too long. We leverage the provided slicePredicate to test whether or not a range length from min to max is long enough to warrant dividing further.

threshold calculation

method: public static int calculateReasonableThreshold(List<byte[]> sequences, int k) Sequential.svg (sequential implementation only)

Define the threashold of slice length used in slicePredicate.

Question to ask yourself: given a list of sequences which are to be k-mer counted, how would you calculate a reasonable threshold?

Testing Your Solution

Correctness

class: __BalanceTestSuite.java Junit.png
package: kmer.balance.exercise
source folder: testing/src/test/java