Nucleobase Counting

From CSE231 Wiki
Jump to navigation Jump to search

Motivation

  • get our feet wet with parallel programming
  • gain experience with the async and finish constructs
  • take different approaches to splitting up the work

We will solve the problem sequentially, then split the work up into 2 tasks, then coarsen the work n-ways, and finally split up the work in a divide-and-conquer recursive style.

Background

Bioinformatics

For this assignment, you will be writing sequential and parallel code to count nucleobases in a human X chromosome.

DNA is made up of four nucleobases: cytosine, guanine, adenine, and thymine. A strand of DNA can thus be represented as a string of letters representing these nucleobases, for example: “ACCGCATAAAGTCC.” However, DNA sequencing is typically not 100% accurate, so some of the nucleobases are not read with high certainty. These bases can be represented as an “N.” A sequence then might look something like “NCCGCATNAAGTCC.” Your goal is to write code that counts the number of occurrences a particular nucleobase or uncertain reads.

We will be using actual data pulled from the US National Library of Medicine, a database maintained by the National Institute of Health. We have already provided you the code that you need to access the chromosome from the database and check your work. You must implement a sequential solution and three parallel solutions to count the given bases in these sequences.

For some more optional background on DNA and nucleotide bases, please check out

Parallel Programming

Parallel Constructs

Related Videos