Half And Half Nucleobase Count Assignment

From CSE231 Wiki
Jump to navigation Jump to search

Motivation

  • get our feet wet with parallel programming
  • gain experience with the fork and join constructs
  • take different approaches to splitting up the work

We will first solve the problem sequentially, then split the work up into 2 tasks.

Background

Bioinformatics

For this assignment, you will be writing sequential and parallel code to count nucleobases in a human X chromosome.

DNA is made up of four nucleobases: cytosine, guanine, adenine, and thymine. A strand of DNA can thus be represented as a string of letters representing these nucleobases, for example: “ACCGCATAAAGTCC.” However, DNA sequencing is typically not 100% accurate, so some of the nucleobases are not read with high certainty. These bases can be represented as an “N.” A sequence then might look something like “NCCGCATNAAGTCC.” Your goal is to write code that counts the number of occurrences a particular nucleobase or uncertain reads.

We will be using actual data pulled from the US National Library of Medicine, a database maintained by the National Institute of Health. We have already provided you the code that you need to access the chromosome from the database and check your work. You must implement a sequential solution and a parallel solution where you split the work into two tasks.

For some more optional background on DNA and nucleotide bases, please check out

Parallel Programming

Mistakes to Avoid

Attention niels epting.svg Warning: Be sure to remove each NotYetImplementedException as you implement your solutions.
Attention niels epting.svg Warning: Do NOT copy the data. We are simply reading the chromosome data, so there is no reason to copy it.
Attention niels epting.svg Warning: Do NOT share data beyond what is necessary. Do NOT use static class fields when you could use local variables instead.
Attention niels epting.svg Warning: Do NOT have the same for loop code duplicated throughout your solutions. Invoke NucleobaseUtils.countRange where appropriate.

Getting Started

Code to Implement

Midpoint

class: MidpointUtils.java Java.png
methods: caclulateMidpoint
package: midpoint.exercise
source folder: student/src/main/java

method: public static int calculateMidpoint(int a, int b) Sequential.svg (sequential implementation only)

In this method, you will need to calculate the midpoint between two numbers. This is as simple as finding the average between the two numbers. There is no need to worry about rounding correctly, just drop everything after the decimal point if the midpoint is not automatically an int.

It is hard to find 1D references for midpoint. Wolfram MathWorld has a breakdown of Midpoint in 2D and 3D. Pick any dimension you like.

Count Range

class: NucleobaseUtils.java Java.png
methods: countRange
package: count.exercise
source folder: student/src/main/java

method: public static int countRange(byte[] chromosome, Nucleobase targetNucleobase, int min, int maxExclusive) Sequential.svg (sequential implementation only)

Note: Nucleobase has a toByte() method.

This utility method should count the number of times a particular nucleobase occurs in the array between [min, maxExclusive).

SequentialNucleobaseCounter

class: SequentialNucleobaseCounter.java Java.png
methods: count
package: count.exercise
source folder: student/src/main/java

method: public int count(byte[] chromosome, Nucleobase targetNucleobase) Sequential.svg (sequential implementation only)

This solution should be achieved with a simple call to one of the utility methods you wrote.

Static Method Invocation Demo

HalfAndHalfNucleobaseCounter

class: HalfAndHalfNucleobaseCounter.java Java.png
methods: count
package: count.exercise
source folder: student/src/main/java

method: public int count(byte[] chromosome, Nucleobase targetNucleobase) Parallel.svg (parallel implementation required)

Half and half.svg

Client

class: HalfAndHalfNucleobaseCounterClient.java DEMO: Java.png
methods: main
package: count.client
source folder: src/main/java

Testing Your Solution

Making sense of JUnit output:

Correctness

class: _HalfAndHalfNucleobaseCounterTestSuite.java Junit.png
package: count.exercise
source folder: testing/src/test/java

Launch _HalfAndHalfNucleobaseCounterTestSuite.java as a JUnit Test to run all of the tests. _HalfAndHalfNucleobaseCounterTestSuite.java is located in testing/src/test/java within the count.exercise package. You can initiate this via right clicking on _HalfAndHalfNucleobaseCounterTestSuite.java and selecting "Run _HalfAndHalfNucleobaseCounterTestSuite".

Pledge, Acknowledgments, Citations

file: nucleobase-count-half-and-half-pledge-acknowledgments-citations.txt

More info about the Honor Pledge