Difference between revisions of "Cholera MapReduce Application"
(32 intermediate revisions by 2 users not shown) | |||
Line 4: | Line 4: | ||
[https://en.wikipedia.org/wiki/Epidemiology Epidemiology is the important study of "why certain people are getting ill."] | [https://en.wikipedia.org/wiki/Epidemiology Epidemiology is the important study of "why certain people are getting ill."] | ||
− | We get a chance to make sense of the data in a relatively open ended studio. | + | We get a chance to make sense of the data in a relatively open-ended studio. |
=Background= | =Background= | ||
+ | |||
+ | {{CollapsibleYouTube|Extra History: The Broad Street Pump|<youtube>TLpzHHbFrHY</youtube> | ||
+ | |||
+ | <youtube>1jlsyucUwpo</youtube> | ||
+ | |||
+ | <youtube>9NVT6iZP2qg</youtube>}} | ||
+ | |||
Imagine you are [https://en.wikipedia.org/wiki/John_Snow a physician] in [https://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak 1854 London in the midst of a cholera outbreak]. Your theory that contaminated water is the cause meets resistance from the medical establishment which holds that it is spread via the air. | Imagine you are [https://en.wikipedia.org/wiki/John_Snow a physician] in [https://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak 1854 London in the midst of a cholera outbreak]. Your theory that contaminated water is the cause meets resistance from the medical establishment which holds that it is spread via the air. | ||
− | Imagine further that your friend [https://en.wikipedia.org/wiki/Ada_Lovelace Ada] has | + | Imagine further that your friend [https://en.wikipedia.org/wiki/Ada_Lovelace Ada] has taught you to program. |
+ | |||
+ | =Lecture= | ||
+ | {{CollapsibleYouTube|CSE 231s Lecture: Cholera|<youtube>Ny5sjWGVuy0</youtube>}} | ||
=Code To Use= | =Code To Use= | ||
− | + | You will be asked to produce evidence from CholeraDeath locations and WaterPump locations. The CholeraDeath data is from [https://www1.udel.edu/johnmack/frec682/cholera/cholera2.html this GIS Analysis] and made available in Java in [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/mapreduce/apps/cholera/util/SohoCholeraOutbreak1854.html SohoCholeraOutbreak1854]. source: [https://web.archive.org/web/20160822002253/http://www1.udel.edu/johnmack/frec682/cholera/deaths.txt deaths.txt]. | |
− | + | The WaterPump data is available via [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html WaterPump.values()]<br/>source: [https://web.archive.org/web/20160822002256/http://www1.udel.edu/johnmack/frec682/cholera/pumps.txt pumps.txt]. | |
− | |||
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/Location.html class Location] | [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/Location.html class Location] | ||
− | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/Location.html# | + | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/Location.html#distanceTo-mapreduce.apps.cholera.core.Location- double distanceTo( Location other )] |
+ | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/Location.html#x double x()] | ||
+ | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/Location.html#y double y()] | ||
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/CholeraDeath.html class CholeraDeath] | [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/CholeraDeath.html class CholeraDeath] | ||
− | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/CholeraDeath.html# | + | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/CholeraDeath.html#location-- Location location()] |
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html enum WaterPump] | [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html enum WaterPump] | ||
− | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html# | + | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html#location-- Location location()] |
+ | |||
+ | =Code To Implement= | ||
+ | In order to provide a more open-ended assignment, we have abstracted out different aspects of the studio. In particular, <code>getValueRepresentation()</code> allows you to specify whether your CholeraApp sees high or low numbers as suspects. For example, if you return <code>ValueRepresentation.HIGH_NUMBERS_SUSPECT</code>, you are indicating that a higher number is more likely to indicate that the water pump is the source of the cholera outbreak. This will aid the [[#Visualization]] in presenting your findings. | ||
+ | |||
+ | We have anticipated two basic approaches to this problem (with a variation^2 on one of the approaches). You may choose to implement an integer-based or a double-based CholeraApp. If you take a different approach than one we have anticipated, that is fine. Get it checked out by an instructor. | ||
+ | |||
+ | ==CholeraOutbreak== | ||
+ | |||
+ | {{CodeToImplement|CholeraOutbreak|produceEvidence|mapreduce.apps.cholera.exercise}} | ||
+ | |||
+ | ===valueRepresentation=== | ||
+ | {{Sequential|public static ValueRepresentation valueRepresentation()}} | ||
+ | |||
+ | You have been provided with this implementation which should be sufficient: | ||
+ | <syntaxhighlight lang="java"> | ||
+ | public static ValueRepresentation valueRepresentation() { | ||
+ | return ValueRepresentation.AUTO_DETECT; | ||
+ | } | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | ===thresholdIfApplicable=== | ||
+ | {{Sequential|public static Optional<Double> thresholdIfApplicable()}} | ||
+ | |||
+ | For the vast majority of students, a threshold value will not be applicable to their solution and thus this method need not to be changed: | ||
+ | |||
+ | <syntaxhighlight lang="java"> | ||
+ | public static Optional<Double> thresholdIfApplicable() { | ||
+ | final boolean IS_THRESHOLD_APPLICABLE = true; | ||
+ | if (IS_THRESHOLD_APPLICABLE) { | ||
+ | throw new NotYetImplementedException(); | ||
+ | } else { | ||
+ | return Optional.empty(); | ||
+ | } | ||
+ | } | ||
+ | </syntaxhighlight> | ||
− | + | However, if your solution has a threshold value which would be necessary for visualization and testing, return it here via Optional.of(threshold). | |
− | |||
− | = | + | Again, you need not concern yourself with this method if you are going with a threshold-less strategy. |
− | {{ | + | |
+ | |||
+ | ===produceEvidence(choleraDeaths)=== | ||
+ | {{Sequential|public static Map<WaterPump, ? extends Number> produceEvidence(CholeraDeath[] choleraDeaths)}} | ||
+ | |||
+ | Note: [https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html Integer] and [https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html Double] extend [https://docs.oracle.com/javase/8/docs/api/java/lang/Number.html Number]. | ||
+ | |||
+ | How to leverage map reduce to solve this problem. | ||
− | + | If going the Integer route: | |
− | |||
− | == | + | <syntaxhighlight lang="java"> |
− | + | Mapper<CholeraDeath, WaterPump, Integer> mapper = null; // TODO | |
+ | AccumulatorCombinerReducer<Integer, ?, Integer> accumulatorCombinerReducer = null; // TODO: note you have already created a class which will suffice here. | ||
+ | MapReduceFramework<CholeraDeath, WaterPump, Integer, ?, Integer> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer); | ||
+ | return framework.mapReduceAll(choleraDeaths); | ||
+ | </syntaxhighlight> | ||
− | + | If going the Double route: | |
− | + | <syntaxhighlight lang="java"> | |
− | + | Mapper<CholeraDeath, WaterPump, Double> mapper = null; // TODO | |
+ | AccumulatorCombinerReducer<Double, ?, Double> accumulatorCombinerReducer = null; // TODO | ||
+ | MapReduceFramework<CholeraDeath, WaterPump, Double, ?, Double> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer); | ||
+ | return framework.mapReduceAll(choleraDeaths); | ||
+ | </syntaxhighlight> | ||
− | + | {{Alert|You are allowed to and encouraged to create additional class file(s) and/or use code from previous exercises.}} | |
− | {{ | ||
=Testing Your Solution= | =Testing Your Solution= | ||
Line 50: | Line 108: | ||
Original Map Drawn By John Snow: | Original Map Drawn By John Snow: | ||
− | [[File:Snow-cholera-map-1.jpg| | + | [[File:Snow-cholera-map-1.jpg|400px]] |
Our Visualization App: | Our Visualization App: | ||
− | {{Viz| | + | {{Viz|CholeraOutbreakViz|mapreduce.apps.cholera.viz|main}} |
[[File:CholeraOutbreak.png|600px]] | [[File:CholeraOutbreak.png|600px]] | ||
==Correctness== | ==Correctness== | ||
− | {{TestSuite| | + | If you have chosen to go a different route than the ones we anticipated, do not worry about passing the test suite. Demo your work to an instructor in class and we can discuss its fitness. |
+ | |||
+ | {{TestSuite|_CholeraOutbreakTestSuite|mapreduce.apps.cholera.exercise}} | ||
+ | |||
+ | =Pledge, Acknowledgments, Citations= | ||
+ | {{Pledge|map-reduce-cholera-app}} |
Latest revision as of 03:35, 18 March 2024
Contents
Motivation
Epidemiology is the important study of "why certain people are getting ill."
We get a chance to make sense of the data in a relatively open-ended studio.
Background
Video: Extra History: The Broad Street Pump |
---|
|
Imagine you are a physician in 1854 London in the midst of a cholera outbreak. Your theory that contaminated water is the cause meets resistance from the medical establishment which holds that it is spread via the air.
Imagine further that your friend Ada has taught you to program.
Lecture
Video: CSE 231s Lecture: Cholera |
---|
Code To Use
You will be asked to produce evidence from CholeraDeath locations and WaterPump locations. The CholeraDeath data is from this GIS Analysis and made available in Java in SohoCholeraOutbreak1854. source: deaths.txt.
The WaterPump data is available via WaterPump.values()
source: pumps.txt.
Code To Implement
In order to provide a more open-ended assignment, we have abstracted out different aspects of the studio. In particular, getValueRepresentation()
allows you to specify whether your CholeraApp sees high or low numbers as suspects. For example, if you return ValueRepresentation.HIGH_NUMBERS_SUSPECT
, you are indicating that a higher number is more likely to indicate that the water pump is the source of the cholera outbreak. This will aid the #Visualization in presenting your findings.
We have anticipated two basic approaches to this problem (with a variation^2 on one of the approaches). You may choose to implement an integer-based or a double-based CholeraApp. If you take a different approach than one we have anticipated, that is fine. Get it checked out by an instructor.
CholeraOutbreak
class: | CholeraOutbreak.java | |
methods: | produceEvidence | |
package: | mapreduce.apps.cholera.exercise | |
source folder: | student/src/main/java |
valueRepresentation
method: public static ValueRepresentation valueRepresentation()
(sequential implementation only)
You have been provided with this implementation which should be sufficient:
public static ValueRepresentation valueRepresentation() {
return ValueRepresentation.AUTO_DETECT;
}
thresholdIfApplicable
method: public static Optional<Double> thresholdIfApplicable()
(sequential implementation only)
For the vast majority of students, a threshold value will not be applicable to their solution and thus this method need not to be changed:
public static Optional<Double> thresholdIfApplicable() {
final boolean IS_THRESHOLD_APPLICABLE = true;
if (IS_THRESHOLD_APPLICABLE) {
throw new NotYetImplementedException();
} else {
return Optional.empty();
}
}
However, if your solution has a threshold value which would be necessary for visualization and testing, return it here via Optional.of(threshold).
Again, you need not concern yourself with this method if you are going with a threshold-less strategy.
produceEvidence(choleraDeaths)
method: public static Map<WaterPump, ? extends Number> produceEvidence(CholeraDeath[] choleraDeaths)
(sequential implementation only)
Note: Integer and Double extend Number.
How to leverage map reduce to solve this problem.
If going the Integer route:
Mapper<CholeraDeath, WaterPump, Integer> mapper = null; // TODO
AccumulatorCombinerReducer<Integer, ?, Integer> accumulatorCombinerReducer = null; // TODO: note you have already created a class which will suffice here.
MapReduceFramework<CholeraDeath, WaterPump, Integer, ?, Integer> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer);
return framework.mapReduceAll(choleraDeaths);
If going the Double route:
Mapper<CholeraDeath, WaterPump, Double> mapper = null; // TODO
AccumulatorCombinerReducer<Double, ?, Double> accumulatorCombinerReducer = null; // TODO
MapReduceFramework<CholeraDeath, WaterPump, Double, ?, Double> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer);
return framework.mapReduceAll(choleraDeaths);
Alert:You are allowed to and encouraged to create additional class file(s) and/or use code from previous exercises. |
Testing Your Solution
Visualization
Original Map Drawn By John Snow:
Our Visualization App:
class: | CholeraOutbreakViz.java | VIZ |
package: | mapreduce.apps.cholera.viz | |
source folder: | student/src/main/java |
Correctness
If you have chosen to go a different route than the ones we anticipated, do not worry about passing the test suite. Demo your work to an instructor in class and we can discuss its fitness.
class: | _CholeraOutbreakTestSuite.java | |
package: | mapreduce.apps.cholera.exercise | |
source folder: | testing/src/test/java |
Pledge, Acknowledgments, Citations
file: | map-reduce-cholera-app-pledge-acknowledgments-citations.txt |
More info about the Honor Pledge