Half And Half Nucleobase Count Assignment
Contents
Motivation
- get our feet wet with parallel programming
- gain experience with the
fork
andjoin
constructs - take different approaches to splitting up the work
We will first solve the problem sequentially, then split the work up into 2 tasks.
Background
Bioinformatics
For this assignment, you will be writing sequential and parallel code to count nucleobases in a human X chromosome.
DNA is made up of four nucleobases: cytosine, guanine, adenine, and thymine. A strand of DNA can thus be represented as a string of letters representing these nucleobases, for example: “ACCGCATAAAGTCC.” However, DNA sequencing is typically not 100% accurate, so some of the nucleobases are not read with high certainty. These bases can be represented as an “N.” A sequence then might look something like “NCCGCATNAAGTCC.” Your goal is to write code that counts the number of occurrences a particular nucleobase or uncertain reads.
We will be using actual data pulled from the US National Library of Medicine, a database maintained by the National Institute of Health. We have already provided you the code that you need to access the chromosome from the database and check your work. You must implement a sequential solution and a parallel solution where you split the work into two tasks.
For some more optional background on DNA and nucleotide bases, please check out
Parallel Programming
Mistakes to Avoid
Warning: Be sure to remove each NotYetImplementedException as you implement your solutions. |
Warning: Do NOT copy the data. We are simply reading the chromosome data, so there is no reason to copy it. |
Warning: Do NOT share data beyond what is necessary. Do NOT use static class fields when you could use local variables instead. |
Warning: Do NOT have the same for loop code duplicated throughout your solutions. Invoke NucleobaseUtils.countRange where appropriate. |
Getting Started
Code to Implement
Midpoint
class: | MidpointUtils.java | |
methods: | caclulateMidpoint | |
package: | midpoint.exercise | |
source folder: | student/src/main/java |
method: public static int calculateMidpoint(int a, int b)
(sequential implementation only)
In this method, you will need to calculate the midpoint between two numbers. This is as simple as finding the average between the two numbers. There is no need to worry about rounding correctly, just drop everything after the decimal point if the midpoint is not automatically an int.
It is hard to find 1D references for midpoint. Wolfram MathWorld has a breakdown of Midpoint in 2D and 3D. Pick any dimension you like.
Spoiler |
There are at least two correct ways to implement the midpoint:
|
Count Range
class: | NucleobaseUtils.java | |
methods: | countRange | |
package: | count.exercise | |
source folder: | student/src/main/java |
method: public static int countRange(byte[] chromosome, Nucleobase targetNucleobase, int min, int maxExclusive)
(sequential implementation only)
Note: Nucleobase has a toByte() method.
This utility method should count the number of times a particular nucleobase occurs in the array between [min, maxExclusive).
Spoiler |
Use a for loop to iterate from min to maxExclusive. |
SequentialNucleobaseCounter
class: | SequentialNucleobaseCounter.java | |
methods: | count | |
package: | count.exercise | |
source folder: | student/src/main/java |
method: public int count(byte[] chromosome, Nucleobase targetNucleobase)
(sequential implementation only)
This solution should be achieved with a simple call to one of the utility methods you wrote.
Spoiler |
Invoke countRangeSequential from 0 to the chromosome array length. |
Static Method Invocation Demo
HalfAndHalfNucleobaseCounter
class: | HalfAndHalfNucleobaseCounter.java | |
methods: | count | |
package: | count.exercise | |
source folder: | student/src/main/java |
method: public int count(byte[] chromosome, Nucleobase targetNucleobase)
(parallel implementation required)
Client
class: | HalfAndHalfNucleobaseCounterClient.java | DEMO: |
methods: | main | |
package: | count.client | |
source folder: | src/main/java |
Testing Your Solution
Making sense of JUnit output:
Correctness
class: | _HalfAndHalfNucleobaseCounterTestSuite.java | |
package: | count.exercise | |
source folder: | testing/src/test/java |
Launch _HalfAndHalfNucleobaseCounterTestSuite.java as a JUnit Test to run all of the tests. _HalfAndHalfNucleobaseCounterTestSuite.java is located in testing/src/test/java within the count.exercise package. You can initiate this via right clicking on _HalfAndHalfNucleobaseCounterTestSuite.java and selecting "Run _HalfAndHalfNucleobaseCounterTestSuite".
Pledge, Acknowledgments, Citations
file: | nucleobase-count-half-and-half-pledge-acknowledgments-citations.txt |
More info about the Honor Pledge