Difference between revisions of "Cholera MapReduce Application"

From CSE231 Wiki
Jump to navigation Jump to search
 
(9 intermediate revisions by the same user not shown)
Line 17: Line 17:
  
 
Imagine further that your friend [https://en.wikipedia.org/wiki/Ada_Lovelace Ada] has taught you to program.
 
Imagine further that your friend [https://en.wikipedia.org/wiki/Ada_Lovelace Ada] has taught you to program.
 +
 +
=Lecture=
 +
{{CollapsibleYouTube|CSE 231s Lecture: Cholera|<youtube>Ny5sjWGVuy0</youtube>}}
  
 
=Code To Use=
 
=Code To Use=
You will be asked to produce evidence from CholeraDeath locations and WaterPump locations.  The CholeraDeath data that will be passed to you is from [https://www1.udel.edu/johnmack/frec682/cholera/cholera2.html this GIS Analysis] and made available in Java in [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/SohoCholeraOutbreak1854.html#getDeaths() SohoCholeraOutbreak1854.getDeaths()]. source: [https://web.archive.org/web/20160822002253/http://www1.udel.edu/johnmack/frec682/cholera/deaths.txt deaths.txt].
+
You will be asked to produce evidence from CholeraDeath locations and WaterPump locations.  The CholeraDeath data is from [https://www1.udel.edu/johnmack/frec682/cholera/cholera2.html this GIS Analysis] and made available in Java in [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/mapreduce/apps/cholera/util/SohoCholeraOutbreak1854.html SohoCholeraOutbreak1854]. source: [https://web.archive.org/web/20160822002253/http://www1.udel.edu/johnmack/frec682/cholera/deaths.txt deaths.txt].
  
 
The WaterPump data is available via [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html WaterPump.values()]<br/>source: [https://web.archive.org/web/20160822002256/http://www1.udel.edu/johnmack/frec682/cholera/pumps.txt pumps.txt].
 
The WaterPump data is available via [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html WaterPump.values()]<br/>source: [https://web.archive.org/web/20160822002256/http://www1.udel.edu/johnmack/frec682/cholera/pumps.txt pumps.txt].
Line 33: Line 36:
 
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html enum WaterPump]
 
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html enum WaterPump]
 
: [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html#location-- Location location()]
 
: [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/WaterPump.html#location-- Location location()]
 
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/SohoCholeraOutbreak1854.html class SohoCholeraOutbreak1854]
 
: [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cholera/core/SohoCholeraOutbreak1854.html#deaths-- <nowiki>static CholeraDeath[] deaths()</nowiki>]
 
  
 
=Code To Implement=
 
=Code To Implement=
Line 42: Line 42:
 
We have anticipated two basic approaches to this problem (with a variation^2 on one of the approaches). You may choose to implement an integer-based or a double-based CholeraApp. If you take a different approach than one we have anticipated, that is fine.  Get it checked out by an instructor.
 
We have anticipated two basic approaches to this problem (with a variation^2 on one of the approaches). You may choose to implement an integer-based or a double-based CholeraApp. If you take a different approach than one we have anticipated, that is fine.  Get it checked out by an instructor.
  
==CholeraUtils==
+
==CholeraOutbreak==
 
 
===createApp===
 
return a new CholeraApp. Depending on which approach you plan to implement, it will be either <code>IntegerCholeraApp()</code> or <code>DoubleCholeraApp()</code>
 
  
==IntegerCholeraApp or DoubleCholeraApp==
+
{{CodeToImplement|CholeraOutbreak|produceEvidence|mapreduce.apps.cholera.exercise}}
{{CodeToImplement|IntegerCholeraApp or DoubleCholeraApp|getValueRepresentation<br>createMapper<br>createCollector|mapreduce.apps.cholera.exercise}}
 
  
===getValueRepresentation===
+
===valueRepresentation===
{{Sequential|public static ValueRepresentation getValueRepresentation()}}
+
{{Sequential|public static ValueRepresentation valueRepresentation()}}
  
return one of the three values of the [https://docs.oracle.com/javase/tutorial/java/javaOO/enum.html enum] <code>ValueRepresentation</code> to aid the visualization and testing. You can access the specific enum constant by typing <code> ValueRepresentation. </code> .
+
You have been provided with this implementation which should be sufficient:
 +
<syntaxhighlight lang="java">
 +
public static ValueRepresentation valueRepresentation() {
 +
    return ValueRepresentation.AUTO_DETECT;
 +
}
 +
</syntaxhighlight>
  
<nowiki>public enum ValueRepresentation {
+
===thresholdIfApplicable===
HIGH_NUMBERS_SUSPECT, LOW_NUMBERS_SUSPECT, LOW_NUMBERS_SUSPECT_SQUARED
+
{{Sequential|public static Optional<Double> thresholdIfApplicable()}}
}</nowiki>
 
  
===createMapper===
+
For the vast majority of students, a threshold value will not be applicable to their solution and thus this method need not to be changed:
{{Sequential|public static Mapper<CholeraDeath, WaterPump, Number> createMapper()}}
 
  
We have implemented WaterPump as an enum.  You can access all the enum constants of any enum [https://docs.oracle.com/javase/tutorial/java/javaOO/enum.html via its values() method]. For example:
+
<syntaxhighlight lang="java">
 +
public static Optional<Double> thresholdIfApplicable() {
 +
final boolean IS_THRESHOLD_APPLICABLE = true;
 +
if (IS_THRESHOLD_APPLICABLE) {
 +
throw new NotYetImplementedException();
 +
} else {
 +
return Optional.empty();
 +
}
 +
}
 +
</syntaxhighlight>
  
<nowiki>for (WaterPump pump : WaterPump.values()) {
+
However, if your solution has a threshold value which would be necessary for visualization and testing, return it here via Optional.of(threshold).
}</nowiki>
 
  
If you need inspirations on how to implement a Mapper, revisit past assignments such as [https://classes.engineering.wustl.edu/cse231/core/index.php?title=Int_Sum_MapReduce_Apps_Studio#Card_Mapper CardMapper and WordCountMapper], which implements the Mapper<E, K, V> interface. Or you can directly tunnel into the Mapper<E, K, V> interface to see what method you need to implement.
+
Again, you need not concern yourself with this method if you are going with a threshold-less strategy.
  
===createReducer===
 
Depending on your approach, you may be able to reuse one of your existing Reducers.  Otherwise, implement one to go with your Mapper.
 
  
{{Sequential|public static Reducer<? extends Number, ?, ? extends Number> createReducer()}}
+
===produceEvidence(choleraDeaths)===
 +
{{Sequential|public static Map<WaterPump, ? extends Number> produceEvidence(CholeraDeath[] choleraDeaths)}}
  
note: if you decide to build your own Reducer, the following will likely be a fine charateristics method
+
Note: [https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html Integer] and [https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html Double] extend [https://docs.oracle.com/javase/8/docs/api/java/lang/Number.html Number].
  
<nowiki> @Override
+
How to leverage map reduce to solve this problem.
public Set<Characteristics> characteristics() {
 
return EnumSet.noneOf(Characteristics.class);
 
}</nowiki>
 
  
===OPTIONAL getThresholdIfApplicable===
+
If going the Integer route:
We have noted that some students have been going with a threshold approach.  In order to support this in the visualization and the test (and to minimize the chances of interfering with students who have already started) we have added a separate class file CholeraThreshold.  You need not concern yourself with this file if you are going with a threshold-less strategy.
 
  
{{CodeToImplement|CholeraApp|getThresholdIfApplicable|mapreduce.apps.cholera.studio}}
+
<syntaxhighlight lang="java">
 +
Mapper<CholeraDeath, WaterPump, Integer> mapper = null; // TODO
 +
AccumulatorCombinerReducer<Integer, ?, Integer> accumulatorCombinerReducer = null; // TODO: note you have already created a class which will suffice here.
 +
MapReduceFramework<CholeraDeath, WaterPump, Integer, ?, Integer> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer);
 +
return framework.mapReduceAll(choleraDeaths);
 +
</syntaxhighlight>
  
{{Sequential|public static double getThresholdIfApplicable()}}
+
If going the Double route:
  
Return the threshold if you are using one in your Mapper.  Otherwise return [https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#NaN Double.NaN].
+
<syntaxhighlight lang="java">
 +
Mapper<CholeraDeath, WaterPump, Double> mapper = null; // TODO
 +
AccumulatorCombinerReducer<Double, ?, Double> accumulatorCombinerReducer = null; // TODO
 +
MapReduceFramework<CholeraDeath, WaterPump, Double, ?, Double> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer);
 +
return framework.mapReduceAll(choleraDeaths);
 +
</syntaxhighlight>
  
<nowiki> public static double getThresholdIfApplicable() {
+
{{Alert|You are allowed to and encouraged to create additional class file(s) and/or use code from previous exercises.}}
final boolean IS_THRESHOLD_APPLICABLE = false;
 
if (IS_THRESHOLD_APPLICABLE) {
 
throw new NotYetImplementedException();
 
} else {
 
return Double.NaN;
 
}
 
}</nowiki>
 
  
 
=Testing Your Solution=
 
=Testing Your Solution=
Line 103: Line 108:
 
Original Map Drawn By John Snow:
 
Original Map Drawn By John Snow:
  
[[File:Snow-cholera-map-1.jpg||400px]]
+
[[File:Snow-cholera-map-1.jpg|400px]]
  
 
Our Visualization App:
 
Our Visualization App:
  
{{Viz|CholeraOutbreakVisualizationApp|mapreduce.apps.cholera.viz}}
+
{{Viz|CholeraOutbreakViz|mapreduce.apps.cholera.viz|main}}
 
[[File:CholeraOutbreak.png|600px]]
 
[[File:CholeraOutbreak.png|600px]]
  
Line 113: Line 118:
 
If you have chosen to go a different route than the ones we anticipated, do not worry about passing the test suite.  Demo your work to an instructor in class and we can discuss its fitness.
 
If you have chosen to go a different route than the ones we anticipated, do not worry about passing the test suite.  Demo your work to an instructor in class and we can discuss its fitness.
  
{{TestSuite|CholeraStudioTestSuite|mapreduce}}
+
{{TestSuite|_CholeraOutbreakTestSuite|mapreduce.apps.cholera.exercise}}
 +
 
 +
=Pledge, Acknowledgments, Citations=
 +
{{Pledge|map-reduce-cholera-app}}

Latest revision as of 03:35, 18 March 2024

John Snow memorial and pub.jpg

Motivation

Epidemiology is the important study of "why certain people are getting ill."

We get a chance to make sense of the data in a relatively open-ended studio.

Background

Video: Extra History: The Broad Street Pump  

Imagine you are a physician in 1854 London in the midst of a cholera outbreak. Your theory that contaminated water is the cause meets resistance from the medical establishment which holds that it is spread via the air.

Imagine further that your friend Ada has taught you to program.

Lecture

Video: CSE 231s Lecture: Cholera  

Code To Use

You will be asked to produce evidence from CholeraDeath locations and WaterPump locations. The CholeraDeath data is from this GIS Analysis and made available in Java in SohoCholeraOutbreak1854. source: deaths.txt.

The WaterPump data is available via WaterPump.values()
source: pumps.txt.

class Location

double distanceTo( Location other )
double x()
double y()

class CholeraDeath

Location location()

enum WaterPump

Location location()

Code To Implement

In order to provide a more open-ended assignment, we have abstracted out different aspects of the studio. In particular, getValueRepresentation() allows you to specify whether your CholeraApp sees high or low numbers as suspects. For example, if you return ValueRepresentation.HIGH_NUMBERS_SUSPECT, you are indicating that a higher number is more likely to indicate that the water pump is the source of the cholera outbreak. This will aid the #Visualization in presenting your findings.

We have anticipated two basic approaches to this problem (with a variation^2 on one of the approaches). You may choose to implement an integer-based or a double-based CholeraApp. If you take a different approach than one we have anticipated, that is fine. Get it checked out by an instructor.

CholeraOutbreak

class: CholeraOutbreak.java Java.png
methods: produceEvidence
package: mapreduce.apps.cholera.exercise
source folder: student/src/main/java

valueRepresentation

method: public static ValueRepresentation valueRepresentation() Sequential.svg (sequential implementation only)

You have been provided with this implementation which should be sufficient:

public static ValueRepresentation valueRepresentation() {
    return ValueRepresentation.AUTO_DETECT;
}

thresholdIfApplicable

method: public static Optional<Double> thresholdIfApplicable() Sequential.svg (sequential implementation only)

For the vast majority of students, a threshold value will not be applicable to their solution and thus this method need not to be changed:

public static Optional<Double> thresholdIfApplicable() {
	final boolean IS_THRESHOLD_APPLICABLE = true;
	if (IS_THRESHOLD_APPLICABLE) {
		throw new NotYetImplementedException();
	} else {
		return Optional.empty();
	}
}

However, if your solution has a threshold value which would be necessary for visualization and testing, return it here via Optional.of(threshold).

Again, you need not concern yourself with this method if you are going with a threshold-less strategy.


produceEvidence(choleraDeaths)

method: public static Map<WaterPump, ? extends Number> produceEvidence(CholeraDeath[] choleraDeaths) Sequential.svg (sequential implementation only)

Note: Integer and Double extend Number.

How to leverage map reduce to solve this problem.

If going the Integer route:

		Mapper<CholeraDeath, WaterPump, Integer> mapper = null; // TODO
		AccumulatorCombinerReducer<Integer, ?, Integer> accumulatorCombinerReducer = null; // TODO: note you have already created a class which will suffice here.
		MapReduceFramework<CholeraDeath, WaterPump, Integer, ?, Integer> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer);
		return framework.mapReduceAll(choleraDeaths);

If going the Double route:

		Mapper<CholeraDeath, WaterPump, Double> mapper = null; // TODO
		AccumulatorCombinerReducer<Double, ?, Double> accumulatorCombinerReducer = null; // TODO
		MapReduceFramework<CholeraDeath, WaterPump, Double, ?, Double> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer);
		return framework.mapReduceAll(choleraDeaths);
Attention niels epting.svg Alert:You are allowed to and encouraged to create additional class file(s) and/or use code from previous exercises.

Testing Your Solution

Visualization

Original Map Drawn By John Snow:

Snow-cholera-map-1.jpg

Our Visualization App:

class: CholeraOutbreakViz.java VIZ
package: mapreduce.apps.cholera.viz
source folder: student/src/main/java

CholeraOutbreak.png

Correctness

If you have chosen to go a different route than the ones we anticipated, do not worry about passing the test suite. Demo your work to an instructor in class and we can discuss its fitness.

class: _CholeraOutbreakTestSuite.java Junit.png
package: mapreduce.apps.cholera.exercise
source folder: testing/src/test/java

Pledge, Acknowledgments, Citations

file: map-reduce-cholera-app-pledge-acknowledgments-citations.txt

More info about the Honor Pledge