Syllabus for CSE 584A “Algorithms for Biological Sequence Comparison”

This syllabus was updated on 3/20/2020 to accommodate the university's move to online instruction for rest of the Spring 2020 semester. Changes are in bold.

Contents

Course Goals and Content

My goal in this class is to give you enough background to understand and apply efficient algorithms for rapidly searching very large bodies of textual data, particularly biological (DNA, RNA, and protein) sequences. Rather than dwell too much on specific biological motivations, we will focus on key algorithmic results that have been successfully applied to problems of biosequence and other textual comparisons. Michael Brent's CSE587A / BIO5495 course is a great place to get a broader perspective on important questions in computational biology.

Biosequence comparison is a challenging computational task because the data volumes are enormous! We'll be studying cutting-edge computational tools for building sequence indices in small space, which is extremely relevant to searching a collection of short ``next-generation'' DNA sequence reads. However, the same techniques, with some tweaks, work for text of all kinds -- web pages, paper abstracts, Chinese newspapers, you name it.

Your primary source for class notes, assignments, and handouts is the course website. Announcements and discussions will be distributed via the class Piazza board. We will also post Zoom links for class meetings and office hours, as well as links to recorded class meetings, on Piazza. We will use Canvas to maintain the gradebook and to turn in and grade written homework, and Github Classroom to turn in coding assignments.

This course has no required textbook. I will use a combination of my own class notes (which will be posted to the course website) and selected papers and review articles from the literature. The following optional texts may be helpful as reference material:

Prerequisites

Formally, this class requires CSE 347. In particular, I expect that you are comfortable with formally proving correctness and running time results for algorithms. We will use some techniques, such as dynamic programming, that were introduced in that class, though you may be able to pick them up even if you have only had 247 or the equivalent. I also assume that you are familiar with basic algorithms for sorting, searching, and hashing, as well as simple collection types like binary trees.

Really, what this class requires is mathematical maturity -- the ability to read, understand, and produce rigorous mathematical arguments. We will use basic logical techniques such as induction, proof by contradiction, and proof by cases extensively, as well as elementary probability and combinatorics. You will communicate your ideas in the form of proofs through your classroom interactions and homeworks. Many of the algorithms we study are rather complex, so it is important to build up confidence in their correctness by writing down proofs before trying to use them. Following the proofs is also a really good way to get comfortable with how the algorithms work, so that you can implement them successfully as part of your homework.

Opportunities to Interact and Seek Help

We will continue to hold class meetings synchronously at the regular time, twice a week, via Zoom. The Zoom link will be available from Piazza. For now, I will use the PDF notes that I usually post before class, possibly with some additional prepared figures, as a basis for class discussions. If that doesn't work out, I will consider recording lectures asynchronously... let's see how it goes.

Every class meeting will be recorded in Zoom, and a link to the recording will be made available after class via Piazza. But please also let me know if you are regularly going to be unable to make our usual class time, so I can try to accommodate you.

Besides our regular class meetings, the instructor and TA will hold weekly office hours. You can find a schedule of office hours and a Zoom link for them on Piazza. If our times do not work for you but you want to chat, please let us know so we can try to accommodate you.

You can also interact with us through the class Piazza discussion board. You can access our Piazza board through Canvas or directly at https://piazza.com/wustl/spring2020/cse584a/home. We will also use Piazza to maintain up-to-date information about office hours and to post class announcements.

All students will be automatically signed up for Piazza by our first day of class. If you have not yet registered on Piazza, you should receive an email inviting you to join. Please register as soon as you receive the e-mail, if you haven't already.

You should use Piazza for all questions not of a confidential nature. Your posts will be seen by the instructor, the TA, and (if you make your post public) by your fellow students. You will find Piazza helpful when your classmates have already asked a question for you -- in fact, someone may already have answered it! But, in order to sustain this, you must ask your questions on Piazza as well.

You can post personal questions on Piazza and make the post private to the instructor only. If you have a question about homework or class logistics that you don't want to share with the class, please post it to Piazza, but leave your post private. All other questions should be posted publicly. You may make your posts anonymous to other students if you wish.

Accommodations

I will make reasonable accommodations for students who require extensions for medical or religious reasons.

Students with documented disabilities are strongly encouraged both to bring any need for accommodation to the attention of the instructor and to make full use of the University's disability resources and accommodations.

Assignments and Grading

There are two types of assignment in this class: homeworks, and a final course project. There are no exams and no scheduled final! Homeworks will give you opportunities to think more about the algorithms and ideas presented in class. There will be around four of them, each containing written and/or programming parts. The final project will entail implementing one or more of the algorithms discussed in class to build a useful bioinformatics application.

Electronic Submission of Homework

We will use Canvas for paperless homework submission. Yes, I like Gradescope too, but I need Canvas in order to support the peer grading approach I use for homeworks. Coding assignments, including the final project, will be turned in through Git repositories created for each student on GitHub Classroom.

To allow for electronic submission, grading, and enforcement of our collaboration policy, I require that you prepare written homework using a word processor or typesetting program, rather than hand-writing it. (You can, however, hand-draw and scan in figures as part of your solutions.) You can read about how to typeset your homework, including math and figures, and the rules and procedures for submission, in our eHomework Guide.

Typesetting your homework takes a little more time, at least initially, but you will find it helpful for problem solving. We don't hand-write papers anymore because word processors offer huge advantages in terms of editing your work, particularly rearranging, reusing, and replacing pieces over time. Think of your homework solutions as mini-essays (which, in a certain sense, they are!), and you'll see why it might be helpful to write them electronically. Don't try to hand-write a beautiful solution and then typeset it at the last minute -- use your word processor or editor to capture your ideas and compose a solution as you go.

Deadlines and Late Homework Policy

Homework will be due at 2:30 PM (class time) on its due date. If all problems for a given homework are not submitted through their respective mechanisms by this deadline, the entire homework will be considered late. Homeworks turned in fewer than 24 hours after deadline are one day late; those turned in at least 24 but less than 48 hours after deadline are two days late; and so forth. Every day or partial day that a homework is late -- including weekends, holidays, and university breaks -- counts toward its total lateness.

Homeworks that are submitted before the deadline but resubmitted after it will be scored according to the date of the latest submission.

No homework will be accepted more than three days late (that is, 72 hours or more) without a previously agreed-on extension. Homeworks turned up to three days late will be accepted but may not receive full credit. More late policy details will be forthcoming based on the observed rate of late submissions for the first homework or two.

Breakdown of Final Grade

The breakdown of your final grade will be as follows:

Some aspects of grading, especially for final projects, are necessarily subjective. In general, however, I'm looking for

To ensure timely grading, I will ask for student volunteers to help me grade written homework problems. Volunteers should post their solutions as private Piazza posts well in advance of the due date for me to evaluate. If you are chosen to grade a problem, you will receive extra credit equal to one-half that problem's value the homework you are grading.

The university is implementing late drop and late pass-fail options for Spring 2020. Details are not completely settled yet (as of 3/20). If you choose to convert this class to pass-fail, the expectations to receive a passing grade are as follows:

  1. Obtain a final grade of at least C-.
  2. Make a sincere effort to complete each assignment and the final project.
Please do not completely skip assignments just because you are pass-fail.

Collaboration and Academic Integrity Policy

Please see this page for the policy.