Difference between revisions of "Cholera MapReduce Application"
Line 19: | Line 19: | ||
=Code To Use= | =Code To Use= | ||
− | You will be asked to produce evidence from CholeraDeath locations and WaterPump locations. The CholeraDeath data | + | You will be asked to produce evidence from CholeraDeath locations and WaterPump locations. The CholeraDeath data is from [https://www1.udel.edu/johnmack/frec682/cholera/cholera2.html this GIS Analysis] and made available in Java in [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/SohoCholeraOutbreak1854.html#getDeaths() SohoCholeraOutbreak1854.getDeaths()]. source: [https://web.archive.org/web/20160822002253/http://www1.udel.edu/johnmack/frec682/cholera/deaths.txt deaths.txt]. |
The WaterPump data is available via [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html WaterPump.values()]<br/>source: [https://web.archive.org/web/20160822002256/http://www1.udel.edu/johnmack/frec682/cholera/pumps.txt pumps.txt]. | The WaterPump data is available via [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html WaterPump.values()]<br/>source: [https://web.archive.org/web/20160822002256/http://www1.udel.edu/johnmack/frec682/cholera/pumps.txt pumps.txt]. | ||
Line 34: | Line 34: | ||
: [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html#location-- Location location()] | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html#location-- Location location()] | ||
− | [https://www.cse.wustl.edu/~ | + | [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/mapreduce/apps/cholera/util/Soho1854CholeraOutbreak.html class SohoCholeraOutbreak1854] |
− | : [https://www.cse.wustl.edu/~ | + | : [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/mapreduce/apps/cholera/util/Soho1854CholeraOutbreak.html#deaths-- <nowiki>static CholeraDeath[] deaths()</nowiki>] |
=Code To Implement= | =Code To Implement= |
Revision as of 04:35, 5 March 2023
Motivation
Epidemiology is the important study of "why certain people are getting ill."
We get a chance to make sense of the data in a relatively open-ended studio.
Background
Video: Extra History: The Broad Street Pump |
---|
|
Imagine you are a physician in 1854 London in the midst of a cholera outbreak. Your theory that contaminated water is the cause meets resistance from the medical establishment which holds that it is spread via the air.
Imagine further that your friend Ada has taught you to program.
Code To Use
You will be asked to produce evidence from CholeraDeath locations and WaterPump locations. The CholeraDeath data is from this GIS Analysis and made available in Java in SohoCholeraOutbreak1854.getDeaths(). source: deaths.txt.
The WaterPump data is available via WaterPump.values()
source: pumps.txt.
Code To Implement
In order to provide a more open-ended assignment, we have abstracted out different aspects of the studio. In particular, getValueRepresentation()
allows you to specify whether your CholeraApp sees high or low numbers as suspects. For example, if you return ValueRepresentation.HIGH_NUMBERS_SUSPECT
, you are indicating that a higher number is more likely to indicate that the water pump is the source of the cholera outbreak. This will aid the #Visualization in presenting your findings.
We have anticipated two basic approaches to this problem (with a variation^2 on one of the approaches). You may choose to implement an integer-based or a double-based CholeraApp. If you take a different approach than one we have anticipated, that is fine. Get it checked out by an instructor.
CholeraUtils
createApp
return a new CholeraApp. Depending on which approach you plan to implement, it will be either IntegerCholeraApp()
or DoubleCholeraApp()
IntegerCholeraApp or DoubleCholeraApp
class: | IntegerCholeraApp or DoubleCholeraApp.java | |
methods: | getValueRepresentation createMapper createCollector |
|
package: | mapreduce.apps.cholera.exercise | |
source folder: | student/src/main/java |
getValueRepresentation
method: public static ValueRepresentation getValueRepresentation()
(sequential implementation only)
return one of the three values of the enum ValueRepresentation
to aid the visualization and testing. You can access the specific enum constant by typing ValueRepresentation.
.
public enum ValueRepresentation { HIGH_NUMBERS_SUSPECT, LOW_NUMBERS_SUSPECT, LOW_NUMBERS_SUSPECT_SQUARED }
createMapper
method: public static Mapper<CholeraDeath, WaterPump, Number> createMapper()
(sequential implementation only)
We have implemented WaterPump as an enum. You can access all the enum constants of any enum via its values() method. For example:
for (WaterPump pump : WaterPump.values()) { }
If you need inspirations on how to implement a Mapper, revisit past assignments such as CardMapper and WordCountMapper, which implements the Mapper<E, K, V> interface. Or you can directly tunnel into the Mapper<E, K, V> interface to see what method you need to implement.
createReducer
Depending on your approach, you may be able to reuse one of your existing Reducers. Otherwise, implement one to go with your Mapper.
method: public static Reducer<? extends Number, ?, ? extends Number> createReducer()
(sequential implementation only)
note: if you decide to build your own Reducer, the following will likely be a fine charateristics method
@Override public Set<Characteristics> characteristics() { return EnumSet.noneOf(Characteristics.class); }
OPTIONAL getThresholdIfApplicable
We have noted that some students have been going with a threshold approach. In order to support this in the visualization and the test (and to minimize the chances of interfering with students who have already started) we have added a separate class file CholeraThreshold. You need not concern yourself with this file if you are going with a threshold-less strategy.
class: | CholeraApp.java | |
methods: | getThresholdIfApplicable | |
package: | mapreduce.apps.cholera.studio | |
source folder: | student/src/main/java |
method: public static double getThresholdIfApplicable()
(sequential implementation only)
Return the threshold if you are using one in your Mapper. Otherwise return Double.NaN.
public static double getThresholdIfApplicable() { final boolean IS_THRESHOLD_APPLICABLE = false; if (IS_THRESHOLD_APPLICABLE) { throw new NotYetImplementedException(); } else { return Double.NaN; } }
Testing Your Solution
Visualization
Original Map Drawn By John Snow:
Our Visualization App:
class: | CholeraOutbreakVisualizationApp.java | VIZ |
package: | mapreduce.apps.cholera.viz | |
source folder: | student/src//java |
Correctness
If you have chosen to go a different route than the ones we anticipated, do not worry about passing the test suite. Demo your work to an instructor in class and we can discuss its fitness.
class: | CholeraStudioTestSuite.java | |
package: | mapreduce | |
source folder: | testing/src/test/java |