Nucleobase Counting

From CSE231 Wiki
Revision as of 20:34, 21 July 2017 by Benjaminchoi (talk | contribs)
Jump to navigation Jump to search

Background

For this assignment, you will be writing sequential and parallel code to count nucleobases in a human X chromosome.

DNA is made up of four nucleobases: cytosine, guanine, adenine, and thymine. A strand of DNA can thus be represented as a string of letters representing these nucleobases, for example: “ACCGCATAAAGTCC.” However, DNA sequencing is typically not 100% accurate, so some of the nucleobases are not read with high certainty. These bases can be represented as an “N.” A sequence then might look something like “NCCGCATNAAGTCC.” Your goal is to write code that counts the number of unknown nucleobases.

We will be using actual data pulled from the US National Library of Medicine, a database maintained by the National Institute of Health. We have already provided you the code that you need to access the chromosome from the database and check your work. You must implement a sequential solution and two parallel solutions to count the given bases in these sequences.

For some more optional background on DNA and nucleotide bases, please refer to the links under Optional Reading.

Where to Start

You will find the starting point for this assignment is in the count folder. In the count.assignment package you will find NucleobaseCounting.java.

Sequential Solution

We recommend implementing a sequential solution before moving on to parallel solutions. In order to do this, please modify the countSequential method. When you’re ready to begin, delete the return statement and begin implementing your solution! Please refer to your notes from lecture for help, as we tackled a very similar problem in class.

Hint: maybe some loops might be best for counting things?

Parallel Solution

You need to implement two different parallel solutions to this problem. The first will involve splitting the array into two equal halves, then going through each half of the array in parallel. The second will involve splitting the array into n different pieces, then going through each of those pieces of the array in parallel.

In order to start with then two equal halves solution, please modify the countParallelUpperLowerSplit method. When you’re ready to begin, delete the return statement and begin implementing your solution! Again, please refer to your notes from lecture for help, as we tackled a very similar problem in class.

Hint: don’t forget the finish block! The format will be very similar to the examples.

After you’re finished with that, please modify the countParallelNWaySplit method for the n different pieces solution. As before, when you’re ready to begin, delete the return statement and begin implementing your solution! Again, please refer to your notes from lecture for help, as we tackled a very similar problem in class.

Hint: make an array to store the results of each task. Split the array into n different chunks, each of which contains 1/n elements. Once each task has summed up the results from its chunk, add the results from each chunk to get your final answer.

Running Your Solution

Habanero requires a special VM argument. Running your solution the first time will automatically place what you need into the copy buffer. Follow these steps to give Habanero what it wants.

If you need any further help or clarification, please don’t hesitate to post to piazza and/or reach out to one of the instructors.

Rubric

Total points: 100

  • Assignment passes all tests (50)
  • Methods follow expected guidelines (40)
  • Code is properly styled and cited, if applicable (10)

Optional Reading