In Module 4, you will learn the fundamentals of Python, a popular scripting language, and Regular Expressions, which give you power in text processing.
This article contains your assignments for Module 4.
The following articles on the online class wiki textbook contain information that will help you complete the assignments.
Learn Python and Regular Expressions
Up to this point in CSE 330, you have used only one programming language: PHP. In Modules 4 and 5, you will learn a new language: Python. Before starting this assignment, you should look over the online textbook article about Python. Additionally, you may find it helpful to look over the online textbook article about Regular Expressions.
Write some Regular Expressions
Please write regular expressions that do each the following:
- Match an input string that contains at least one occurrence of the string "hello world"
- Find all words in an input string that contain a triple vowel
- Match an input string that is entirely a flight code, of the format AA####, where AA is a two-digit uppercase airline code, and #### is a three- or four-digit flight number
Save these in a text file that you can show the TA on demo day.
Baseball Stats Counter
The St. Louis Cardinals are the most legendary baseball team in the national league. In this exercise, you will be creating a Python script that reads box scores from a file and computes the Cardinals' players' batting averages in a particular season.
Tips and Instructions
- You should use Python when solving this problem.
- You should use a regular expression to parse players' names, at-bats, hits, and runs from the input file.
- You may want to create a class to hold and compute information about each player.
- Your file should take one command-line argument: the path to an input file. If no path is given, your program should print a usage message.
A sample input file (for the 1940 season) can be found at: http://classes.engineering.wustl.edu/cse330/content/cardinals-1940.txt
The top of the file contains a citation, and what follows is an entire season of Cardinals Baseball box scores. Each game starts with a title; for example:
=== St.Louis Cardinals vs. Chicago Cubs, 1940-04-19 ===
What follows are the performance of each player in that game. The format of each line is as follows:
XXX batted # times with # hits and # runs
where XXX is the player's name (you may assume that the name is unique), and each # are integers representing the number of "at bats", the number of "hits", and the number of "runs", respectively.
Your script needs to output all players' batting average across the season. A player's batting average is that player's total hits divided by that player's total at-bats throughout the entire season. For example, Marty Marion was at bat 435 times in 1940, and he had 121 hits, so his batting average was 0.278.
Each line in the output should have the format:
where XXX is the player's name, and #.### is the player's batting average, rounded to three decimal places.
The players should be sorted by batting average, with the highest batting average on top.
For example, the correct output for the 1940 season is:
Ernie White: 0.429 Walker Cooper: 0.316 Pepper Martin: 0.316 Johnny Mize: 0.314 Enos Slaughter: 0.306 Terry Moore: 0.304 Ernie Koy: 0.301 Joe Orengo: 0.287 Jimmy Brown: 0.280 Marty Marion: 0.279 Creepy Crespi: 0.273 Johnny Hopp: 0.270 Don Gutteridge: 0.269 Mickey Owen: 0.264 Don Padgett: 0.242 Stu Martin: 0.238 Carl Doyle: 0.226 Ira Hutchinson: 0.222 Bill DeLancey: 0.222 Eddie Lake: 0.212 Lon Warneke: 0.209 Hal Epps: 0.200 Max Lanier: 0.200 Clyde Shoun: 0.190 Harry Walker: 0.185 Newt Kimball: 0.182 Bill McGee: 0.178 Carden Gillenwater: 0.160 Mort Cooper: 0.157 Bob Bowman: 0.061
We will be grading the following aspects of your work. There are 50 points total.
- Regular Expressions (15 Points):
- Regular Expression 1 Correct (5 points)
- Regular Expression 2 Correct (5 points)
- Regular Expression 3 Correct (5 points)
- Baseball Stats Counter (35 Points):
- Solution is written entirely in Python, except with permission from the instructor (8 points)
- Correct usage of a regular expression to parse each line of the input file (8 points)
- Script prints a usage message if a command line argument is not present (5 points)
- Output is correct for the 1940 test case (7 points)
- Output is correct for another test case that the TA will run on demo day (7 points)