Difference between revisions of "Module 4"

From CSE330 Wiki
Jump to navigationJump to search
 
(82 intermediate revisions by 10 users not shown)
Line 1: Line 1:
== Install Python Tools ==
+
__NOTOC__
 +
In Module 4, you will learn the fundamentals of Python, a popular scripting language, and Regular Expressions, which give you power in text processing.
  
*<code>sudo yum install python-setuptools</code><br>
+
This article contains your assignments for Module 4.
*<code>sudo yum install python-devel</code>
 
  
== Python Assignment ==
+
== Using the Wiki ==
 +
{{RequiredInstructions|content=
 +
Text enclosed by <syntaxhighlight lang="bash" inline><</syntaxhighlight> and <syntaxhighlight lang="bash" inline>></syntaxhighlight> should be replaced by content unique to you.
  
*You will write a python script that reads a set of student grades in from a file and does some basic parsing and processing. [http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files here] is the section of the python tutorial on reading and writing files.
+
'''Example'''
*An example grades file is [http://research.engineering.wustl.edu/~todd/cse330/grades.txt here]. For the sake of simplicity you can assume that the file name is always going to be grades.txt.
+
 
*The first line of the file is of the form: NUM_LABS,NUM_EXAMS,LAB_WEIGHT
+
<source lang="bash">
*All of the other lines in the file are of the form: FIRSTNAME LASTNAME|GRADE|TYPE
+
$ sudo useradd -r -m -c "<My Full Name>" <usernameHere>
*The FIRSTNAME and LASTNAME fields are the student's full name (always only a first and last name), the GRADE is one grade for the student, and the TYPE describes what kind of assignment the grade was for, in this case either a 'lab' or an 'exam'.
+
</source>
*Your script should read in a grades file with the above format and perform the following:
+
becomes
*# Compute the final grade for every student, given that there were a total of NUM_LABS labs, NUM_EXAMS exams, and that the labs account for a total of LAB_WEIGHT percent of the grade.
+
<source lang="bash">
*# Compute the final letter grade based on the final numeric grade (A = 90 or above, B = 80 - 90, C = 70 - 80, D = 60 - 70, F = below 60). 
+
$ sudo useradd -r -m -c "Zach Cohn" zcohn
*# Ignore any blank lines.
+
</source>
*# Sort the students by last name, and print out final grades in that sorted order.
+
}}
*# No names should be hard coded into the script (you will run your script on another file with different student names when we grade your assignment).
+
 
*# Finally, the script should take one optional command line argument, which is a string to match against student names.  Only names that match the string in full or in part, should be printed out with their final grades.
+
 
*#*For example, with the argument 'John' grades for both John Smith and John Locke should be printed (if those are the only two Johns in the grades.txt file)
+
== Reading ==
 +
 
 +
The following articles on the online class wiki textbook contain information that will help you complete the assignments.
 +
 
 +
* [[Python]]
 +
* [[Regular Expressions]]
 +
* [[FAQ - Mod 4]]
 +
 
 +
== Individual Assignments ==
 +
 
 +
=== Learn Python and Regular Expressions ===
 +
 
 +
Up to this point in CSE 330, you have used only one programming language: PHP.  In Module 4, you will learn a new language: Python.  Before starting this assignment, you should look over the [[Python|online textbook article about Python]].  You should be using version 3 (or greater) of Python. Additionally, you may find it helpful to look over the [[Regular Expressions|online textbook article about Regular Expressions]].
 +
 
 +
=== Write some Regular Expressions ===
 +
 
 +
Please write regular expressions that do each the following:
 +
 
 +
# Match the substring "hello world" in a string.
 +
# Find all character strings in an input string that contains three or more consecutive vowels, regardless of case.
 +
# Match an input string that is '''entirely''' a flight code, of the format AA####, where '''AA''' is a two-letter uppercase airline code, and '''####''' is a three- or four-digit flight number.
 +
Examples:
 +
  Note: text in ''[[red]]'' is matched by the regular expression. Please note that each line is a '''separate''' test input string. For Regex 3, '''six''' separate test cases are given.
 +
  Regex 1:  Programmers will often write ''[[hello world]]'' as their first project with a programming language.
 +
  Regex 2:  The ''[[gooey]]'' peanut butter and jelly sandwich was a ''[[beauty]]''.
 +
  Regex 3:  ''[[AA312]]''
 +
            ''[[AA1298]]''
 +
            ''[[NW1234]]''
 +
            ''[[US443]]''
 +
            US31344
 +
            AA123 extratext
 +
 
 +
           
 +
 
 +
Save your regexes to text files – one each.
 +
 
 +
=== Baseball Stats Counter ===
 +
 
 +
The St. Louis Cardinals are the most legendary baseball team in the national league. In this exercise, you will be creating a Python script that reads box scores from a file and computes the Cardinals' players' batting averages in a particular season.
 +
 
 +
==== Tips and Instructions ====
 +
 
 +
* You should write a Python script file to solve this problem.
 +
* You should use a regular expression to parse players' names, at-bats, hits, and runs from the input file.
 +
* You may want to create a class to hold and compute information about each player.
 +
* Your file should take one command-line argument: the path to an input file.  If no path is given, your program should print a usage message.
 +
 
 +
==== Input ====
 +
 
 +
Sample input files may be found here: http://classes.engineering.wustl.edu/cse330/content/cardinals/
 +
 
 +
The top of the file contains a citation, and what follows is an entire season of Cardinals Baseball box scores.  Each game starts with a title; for example:
 +
 
 +
=== St.Louis Cardinals vs. Chicago Cubs, 1940-04-19 ===
 +
 
 +
What follows are the performance of each player in that game.  The format of each line is as follows:
 +
 
 +
XXX batted # times with # hits and # runs
 +
 
 +
where '''XXX''' is the player's name (you may assume that the name is unique), and each '''#''' are integers representing the number of "at bats", the number of "hits", and the number of "runs", respectively.
 +
 
 +
==== Output ====
 +
 
 +
Your script needs to output all players' batting average across the season.  A player's batting average is that player's total hits divided by that player's total at-bats throughout the entire season.  For example, Johnny Hopp was at bat 149 times in 1940, and he had 41 hits, so his batting average was 0.275.
 +
 
 +
Each line in the output should have the format:
 +
 
 +
XXX: #.###
 +
 
 +
where '''XXX''' is the player's name, and '''#.###''' is the player's batting average, '''rounded''' (not truncated) to three decimal places.
 +
 
 +
The players should be sorted by batting average, with the highest batting average on top.  If two players have the exact same batting average (before rounding occurs), the order between those two players is unimportant.
 +
 
 +
For example, the correct output for the 1940 season is:
 +
<!-- Jimmy Brown was 0.280 and Ira Hutchinson was 0.125 -->
 +
<source lang="text">
 +
Pepper Martin: 0.316
 +
Walker Cooper: 0.316
 +
Johnny Mize: 0.314
 +
Ernie Koy: 0.310
 +
Enos Slaughter: 0.306
 +
Joe Medwick: 0.304
 +
Terry Moore: 0.304
 +
Joe Orengo: 0.286
 +
Jimmy Brown: 0.280
 +
Marty Marion: 0.279
 +
Don Gutteridge: 0.276
 +
Johnny Hopp: 0.275
 +
Creepy Crespi: 0.273
 +
Mickey Owen: 0.265
 +
Bill DeLancey: 0.250
 +
Don Padgett: 0.242
 +
Stu Martin: 0.238
 +
Eddie Lake: 0.222
 +
Hal Epps: 0.214
 +
Lon Warneke: 0.209
 +
Harry Walker: 0.185
 +
Max Lanier: 0.179
 +
Bill McGee: 0.178
 +
Carl Doyle: 0.174
 +
Mort Cooper: 0.157
 +
Clyde Shoun: 0.145
 +
Carden Gillenwater: 0.130
 +
Bob Bowman: 0.067
 +
</source>
 +
 
 +
== Grading ==
 +
 
 +
We will be grading the following aspects of your work.  There are 50 points total.
 +
 
 +
'''Assignments (including code) must be committed to Github by the end of class on the due date (commit early and often). Failing to commit by the end of class on the due date will result in a 0. '''
 +
 
 +
Upload your baseball code and a file with your Regular expressions.
 +
 
 +
# '''Regular Expressions (15 Points):'''
 +
#* '''Your regex files should contain your regex (with no delimiters or flags) and nothing else.'''
 +
#* Regular Expression 1 Correct and saved in a file named <code>regex1.txt</code> (5 points)
 +
#* Regular Expression 2 Correct and saved in a file named <code>regex2.txt</code>(5 points)
 +
#* Regular Expression 3 Correct and saved in a file named <code>regex3.txt</code>(5 points)
 +
# '''Baseball Stats Counter (35 Points):'''
 +
#* Solution is written entirely in Python and saved in a file named <code>baseball.py</code> (8 points)
 +
#*: ''Note: Failing to write your code in Python 3 will result in losing, at a minimum, points for this category.''
 +
#* Correct usage of one or more regular expressions to parse and extract data from each line of the input file (8 points)
 +
#*: ''Note: You should not be using <code>str.split</code> to extract data.''
 +
#* Script prints a usage message if a command line argument is not present (4 points)
 +
#*: ''Note: For an example of a usage message, see [https://en.wikipedia.org/wiki/Usage_message#Examples this link.]''
 +
#* Output is correct for all test cases (15 points)
 +
#*: ''This includes sorting and rounding.''
  
 
[[Category:Module 4]]
 
[[Category:Module 4]]
 
[[Category:Modules]]
 
[[Category:Modules]]

Latest revision as of 19:34, 10 January 2023

In Module 4, you will learn the fundamentals of Python, a popular scripting language, and Regular Expressions, which give you power in text processing.

This article contains your assignments for Module 4.

Using the Wiki

Text enclosed by < and > should be replaced by content unique to you.

Example

$ sudo useradd -r -m -c "<My Full Name>" <usernameHere>

becomes

$ sudo useradd -r -m -c "Zach Cohn" zcohn


Reading

The following articles on the online class wiki textbook contain information that will help you complete the assignments.

Individual Assignments

Learn Python and Regular Expressions

Up to this point in CSE 330, you have used only one programming language: PHP. In Module 4, you will learn a new language: Python. Before starting this assignment, you should look over the online textbook article about Python. You should be using version 3 (or greater) of Python. Additionally, you may find it helpful to look over the online textbook article about Regular Expressions.

Write some Regular Expressions

Please write regular expressions that do each the following:

  1. Match the substring "hello world" in a string.
  2. Find all character strings in an input string that contains three or more consecutive vowels, regardless of case.
  3. Match an input string that is entirely a flight code, of the format AA####, where AA is a two-letter uppercase airline code, and #### is a three- or four-digit flight number.

Examples:

  Note: text in red is matched by the regular expression. Please note that each line is a separate test input string. For Regex 3, six separate test cases are given.
  Regex 1:  Programmers will often write hello world as their first project with a programming language.
  Regex 2:  The gooey peanut butter and jelly sandwich was a beauty.
  Regex 3:  AA312
            AA1298
            NW1234
            US443
            US31344
            AA123 extratext


Save your regexes to text files – one each.

Baseball Stats Counter

The St. Louis Cardinals are the most legendary baseball team in the national league. In this exercise, you will be creating a Python script that reads box scores from a file and computes the Cardinals' players' batting averages in a particular season.

Tips and Instructions

  • You should write a Python script file to solve this problem.
  • You should use a regular expression to parse players' names, at-bats, hits, and runs from the input file.
  • You may want to create a class to hold and compute information about each player.
  • Your file should take one command-line argument: the path to an input file. If no path is given, your program should print a usage message.

Input

Sample input files may be found here: http://classes.engineering.wustl.edu/cse330/content/cardinals/

The top of the file contains a citation, and what follows is an entire season of Cardinals Baseball box scores. Each game starts with a title; for example:

=== St.Louis Cardinals vs. Chicago Cubs, 1940-04-19 ===

What follows are the performance of each player in that game. The format of each line is as follows:

XXX batted # times with # hits and # runs

where XXX is the player's name (you may assume that the name is unique), and each # are integers representing the number of "at bats", the number of "hits", and the number of "runs", respectively.

Output

Your script needs to output all players' batting average across the season. A player's batting average is that player's total hits divided by that player's total at-bats throughout the entire season. For example, Johnny Hopp was at bat 149 times in 1940, and he had 41 hits, so his batting average was 0.275.

Each line in the output should have the format:

XXX: #.###

where XXX is the player's name, and #.### is the player's batting average, rounded (not truncated) to three decimal places.

The players should be sorted by batting average, with the highest batting average on top. If two players have the exact same batting average (before rounding occurs), the order between those two players is unimportant.

For example, the correct output for the 1940 season is:

Pepper Martin: 0.316
Walker Cooper: 0.316
Johnny Mize: 0.314
Ernie Koy: 0.310
Enos Slaughter: 0.306
Joe Medwick: 0.304
Terry Moore: 0.304
Joe Orengo: 0.286
Jimmy Brown: 0.280
Marty Marion: 0.279
Don Gutteridge: 0.276
Johnny Hopp: 0.275
Creepy Crespi: 0.273
Mickey Owen: 0.265
Bill DeLancey: 0.250
Don Padgett: 0.242
Stu Martin: 0.238
Eddie Lake: 0.222
Hal Epps: 0.214
Lon Warneke: 0.209
Harry Walker: 0.185
Max Lanier: 0.179
Bill McGee: 0.178
Carl Doyle: 0.174
Mort Cooper: 0.157
Clyde Shoun: 0.145
Carden Gillenwater: 0.130
Bob Bowman: 0.067

Grading

We will be grading the following aspects of your work. There are 50 points total.

Assignments (including code) must be committed to Github by the end of class on the due date (commit early and often). Failing to commit by the end of class on the due date will result in a 0.

Upload your baseball code and a file with your Regular expressions.

  1. Regular Expressions (15 Points):
    • Your regex files should contain your regex (with no delimiters or flags) and nothing else.
    • Regular Expression 1 Correct and saved in a file named regex1.txt (5 points)
    • Regular Expression 2 Correct and saved in a file named regex2.txt(5 points)
    • Regular Expression 3 Correct and saved in a file named regex3.txt(5 points)
  2. Baseball Stats Counter (35 Points):
    • Solution is written entirely in Python and saved in a file named baseball.py (8 points)
      Note: Failing to write your code in Python 3 will result in losing, at a minimum, points for this category.
    • Correct usage of one or more regular expressions to parse and extract data from each line of the input file (8 points)
      Note: You should not be using str.split to extract data.
    • Script prints a usage message if a command line argument is not present (4 points)
      Note: For an example of a usage message, see this link.
    • Output is correct for all test cases (15 points)
      This includes sorting and rounding.