Difference between revisions of "Module 4"

From CSE330 Wiki
Jump to navigationJump to search
(Copying template from Module 3)
(Fully updating Module 4)
Line 1: Line 1:
In Module 3, you will learn about MySQL, a web application database.
+
__NOTOC__
 +
In Module 4, you will learn the fundamentals of Python, a popular scripting language, and Regular Expressions, which give you power in text processing.
  
This article contains your assignments for Module 3.
+
This article contains your assignments for Module 4.
 +
 
 +
== Reading ==
 +
 
 +
The following articles on the online class wiki textbook contain information that will help you complete the assignments.
 +
 
 +
* [[Python]]
 +
* [[Regular Expressions]]
  
 
== Individual Assignments ==
 
== Individual Assignments ==
  
 +
=== Learn Python and Regular Expressions ===
 +
 +
Up to this point in CSE 330, you have used only one programming language: PHP.  In Modules 4 and 5, you will learn a new language: Python.  Before starting this assignment, you should look over the [[Python|online textbook article about Python]].  Additionally, you may find it helpful to look over the [[Regular Expressions|online textbook article about Regular Expressions]].
 +
 +
=== Write some Regular Expressions ===
 +
 +
Please write regular expressions that do each the following:
 +
 +
# Match an input string that contains at least one occurrence of the string "hello world"
 +
# Find all words in an input string that contain a triple vowel
 +
# Match an input string that is entirely a flight code, of the format AA####, where '''AA''' is a two-digit uppercase airline code, and '''####''' is a three- or four-digit flight number
 +
 +
Save these in a text file that you can show the TA on demo day.
 +
 +
=== Baseball Stats Counter ===
 +
 +
The St. Louis Cardinals are the most legendary baseball team in the national league.  In this exercise, you will be creating a Python script that reads box scores from a file and computes the Cardinals' players' batting averages in a particular season.
 +
 +
==== Tips and Instructions ====
 +
 +
* You should use Python when solving this problem.
 +
* You should use a regular expression to parse players' names, at-bats, hits, and runs from the input file.
 +
* You may want to create a class to hold and compute information about each player.
 +
* Your file should take one command-line argument: the path to an input file.  If no path is given, your program should print a usage message.
 +
 +
==== Input ====
 +
 +
A sample input file (for the 1940 season) can be found at: http://classes.engineering.wustl.edu/cse330/content/cardinals-1940.txt
 +
 +
The top of the file contains a citation, and what follows is an entire season of Cardinals Baseball box scores.  Each game starts with a title; for example:
 +
 +
=== St.Louis Cardinals vs. Chicago Cubs, 1940-04-19 ===
 +
 +
What follows are the performance of each player in that game.  The format of each line is as follows:
 +
 +
XXX batted # times with # hits and # runs
 +
 +
where '''XXX''' is the player's name (you may assume that the name is unique), and each '''#''' are integers representing the number of "at bats", the number of "hits", and the number of "runs", respectively.
 +
 +
==== Output ====
  
== Group Project ==
+
Your script needs to output all players' batting average across the season.  A player's batting average is that player's total hits divided by that player's total at-bats throughout the entire season.  For example, Marty Marion was at bat 435 in 1940, and he had 121 hits, so his batting average was 0.278.
  
=== Web Security and Validation ===
+
Each line in the output should have the format:
  
Your project needs to demonstrate that thought was put into web security and best practice. For more information, see this week's Web Application Security guide: [[Web Application Security, Part 2]]
+
  XXX: #.###
  
In particular:
+
where '''XXX''' is the player's name, and '''#.###''' is the player's batting average, rounded to three decimal places.
  
* '''Your application needs to be secure from SQL injection attacks'''. If you are using prepared queries, you should already be safe on this front.
+
The players should be sorted by batting average, with the highest batting average on top.
* '''All of your output needs to be sanitized using htmlentities()'''.
 
  
You shouldn't forget the practices you learned last week:
+
For example, the correct output for the 1940 season is:
  
* '''You should pass tokens in forms''' to prevent CSRF attacks.
+
<source lang="text">
* '''Your page should validate''' with no errors through the W3C validator.
+
Ernie White: 0.429
 +
Walker Cooper: 0.316
 +
Pepper Martin: 0.316
 +
Johnny Mize: 0.314
 +
Enos Slaughter: 0.306
 +
Terry Moore: 0.304
 +
Ernie Koy: 0.301
 +
Joe Orengo: 0.287
 +
Jimmy Brown: 0.280
 +
Marty Marion: 0.279
 +
Creepy Crespi: 0.273
 +
Johnny Hopp: 0.270
 +
Don Gutteridge: 0.269
 +
Mickey Owen: 0.264
 +
Don Padgett: 0.242
 +
Stu Martin: 0.238
 +
Carl Doyle: 0.226
 +
Ira Hutchinson: 0.222
 +
Bill DeLancey: 0.222
 +
Eddie Lake: 0.212
 +
Lon Warneke: 0.209
 +
Hal Epps: 0.200
 +
Max Lanier: 0.200
 +
Clyde Shoun: 0.190
 +
Harry Walker: 0.185
 +
Newt Kimball: 0.182
 +
Bill McGee: 0.178
 +
Carden Gillenwater: 0.160
 +
Mort Cooper: 0.157
 +
Bob Bowman: 0.061
 +
</source>
  
 
== Grading ==
 
== Grading ==
  
We will be grading the following aspects of your work.  There are 100 points total.
+
We will be grading the following aspects of your work.  There are 50 points total.
  
# '''MySQL Queries (25 Points):'''
+
# '''Regular Expressions (15 Points):'''
#* A MySQL server is running on your instance (2 points)
+
#* Regular Expression 1 Correct (5 points)
#* Tables fields, including data types, are correct (4 points)
+
#* Regular Expression 2 Correct (5 points)
#* Foreign keys are correct (4 points)
+
#* Regular Expression 3 Correct (5 points)
#* The output of each of the five queries is correct (3 points each)
+
# '''Baseball Stats Counter (35 Points):'''
# '''News Site (60 Points):'''
+
#* Solution is written entirely in Python, except with permission from the instructor (8 points)
#* '''''User Management (20 Points):'''''
+
#* Correct usage of a regular expression to parse each line of the input file (8 points)
#** A session is created when a user logs in (3 points)
+
#* Script prints a usage message if a command line argument is not present (5 points)
#** New users can register (3 points)
+
#* Output is correct for the 1940 test case (7 points)
#** Passwords are hashed using salted one-way encryption (3 points)
+
#* Output is correct for another test case that the TA will run on demo day (7 points)
#** Users can log out (3 points)
 
#** A user can edit and delete his/her own stories and comments but cannot edit or delete the stories or comments of another user (8 points)
 
#* '''''Story and Comment Management (20 Points):'''''
 
#** Relational database is configured with correct data types and foreign keys (4 points)
 
#** Stories can be posted (3 points)
 
#** A link can be associated with each story using a separate database field (3 points)
 
#** Comments can be posted in association with a story (4 points)
 
#** Stories can be edited and deleted (3 points)
 
#** Comments can be edited and deleted (3 points)
 
#**: ''Note: Although there are only 6 points allocated for editing/deleting in this section, there are 8 more points at stake in the User Management section that cannot be earned unless editing/deleting is implemented.  Implementing editing but not deleting, or vice-versa, will result in earning half the points.''
 
#* '''''Best Practices (15 Points):'''''
 
#** Code is well formatted and easy to read (3 points)
 
#** Safe from SQL Injection attacks (3 points)
 
#** All content is sanitized on output (3 points)
 
#** All pages pass the W3C validator (3 points)
 
#** CSRF tokens are passed when creating, editing, and deleting comments and stories (3 points)
 
#* '''''Usability (5 Points):'''''
 
#** Site is intuitive to use and navigate (4 points)
 
#** Site is visually appealing (1 point)
 
# '''Creative Portion (15 Points)'''
 
  
 
[[Category:Module 4]]
 
[[Category:Module 4]]
 
[[Category:Modules]]
 
[[Category:Modules]]

Revision as of 00:34, 21 April 2013

In Module 4, you will learn the fundamentals of Python, a popular scripting language, and Regular Expressions, which give you power in text processing.

This article contains your assignments for Module 4.

Reading

The following articles on the online class wiki textbook contain information that will help you complete the assignments.

Individual Assignments

Learn Python and Regular Expressions

Up to this point in CSE 330, you have used only one programming language: PHP. In Modules 4 and 5, you will learn a new language: Python. Before starting this assignment, you should look over the online textbook article about Python. Additionally, you may find it helpful to look over the online textbook article about Regular Expressions.

Write some Regular Expressions

Please write regular expressions that do each the following:

  1. Match an input string that contains at least one occurrence of the string "hello world"
  2. Find all words in an input string that contain a triple vowel
  3. Match an input string that is entirely a flight code, of the format AA####, where AA is a two-digit uppercase airline code, and #### is a three- or four-digit flight number

Save these in a text file that you can show the TA on demo day.

Baseball Stats Counter

The St. Louis Cardinals are the most legendary baseball team in the national league. In this exercise, you will be creating a Python script that reads box scores from a file and computes the Cardinals' players' batting averages in a particular season.

Tips and Instructions

  • You should use Python when solving this problem.
  • You should use a regular expression to parse players' names, at-bats, hits, and runs from the input file.
  • You may want to create a class to hold and compute information about each player.
  • Your file should take one command-line argument: the path to an input file. If no path is given, your program should print a usage message.

Input

A sample input file (for the 1940 season) can be found at: http://classes.engineering.wustl.edu/cse330/content/cardinals-1940.txt

The top of the file contains a citation, and what follows is an entire season of Cardinals Baseball box scores. Each game starts with a title; for example:

=== St.Louis Cardinals vs. Chicago Cubs, 1940-04-19 ===

What follows are the performance of each player in that game. The format of each line is as follows:

XXX batted # times with # hits and # runs

where XXX is the player's name (you may assume that the name is unique), and each # are integers representing the number of "at bats", the number of "hits", and the number of "runs", respectively.

Output

Your script needs to output all players' batting average across the season. A player's batting average is that player's total hits divided by that player's total at-bats throughout the entire season. For example, Marty Marion was at bat 435 in 1940, and he had 121 hits, so his batting average was 0.278.

Each line in the output should have the format:

XXX: #.###

where XXX is the player's name, and #.### is the player's batting average, rounded to three decimal places.

The players should be sorted by batting average, with the highest batting average on top.

For example, the correct output for the 1940 season is:

Ernie White: 0.429
Walker Cooper: 0.316
Pepper Martin: 0.316
Johnny Mize: 0.314
Enos Slaughter: 0.306
Terry Moore: 0.304
Ernie Koy: 0.301
Joe Orengo: 0.287
Jimmy Brown: 0.280
Marty Marion: 0.279
Creepy Crespi: 0.273
Johnny Hopp: 0.270
Don Gutteridge: 0.269
Mickey Owen: 0.264
Don Padgett: 0.242
Stu Martin: 0.238
Carl Doyle: 0.226
Ira Hutchinson: 0.222
Bill DeLancey: 0.222
Eddie Lake: 0.212
Lon Warneke: 0.209
Hal Epps: 0.200
Max Lanier: 0.200
Clyde Shoun: 0.190
Harry Walker: 0.185
Newt Kimball: 0.182
Bill McGee: 0.178
Carden Gillenwater: 0.160
Mort Cooper: 0.157
Bob Bowman: 0.061

Grading

We will be grading the following aspects of your work. There are 50 points total.

  1. Regular Expressions (15 Points):
    • Regular Expression 1 Correct (5 points)
    • Regular Expression 2 Correct (5 points)
    • Regular Expression 3 Correct (5 points)
  2. Baseball Stats Counter (35 Points):
    • Solution is written entirely in Python, except with permission from the instructor (8 points)
    • Correct usage of a regular expression to parse each line of the input file (8 points)
    • Script prints a usage message if a command line argument is not present (5 points)
    • Output is correct for the 1940 test case (7 points)
    • Output is correct for another test case that the TA will run on demo day (7 points)