Difference between revisions of "MapReduce Frameworks Lab"

From CSE231 Wiki
Jump to navigation Jump to search
 
(201 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The MapReduce assignment has many different pieces that come together to solve problems.  So much so that in the end writing each individual class or method can be less challenging than simply figuring out what each piece’s role should be in the larger puzzle.
+
[[Bottleneck_MapReduce_Framework_Assignment|Bottleneck MapReduce Framework]]
  
At a high level we have a framework, a mapper, a reducer, and some data we would like to process.
+
[[Matrix_MapReduce_Framework_Assignment|Matrix MapReduce Framework]]
  
Let’s start with the Framework.  At the end of the day, the first job of the Framework is mapAll() which unsurprisingly calls map() on the Mapper for every piece of data.  That’s it. Don’t overcomplicate it. The framework leaves all of the details of what (key, value) pairs to emit to the Mapper.  Just call map() for every piece of data.  Granted, it does need to create an instance of Context to receive those emissions.  For the SimpleFramework we end up creating a context that holds a single dictionary to handle all of the key, value pairs the Mapper throws at itThis IndividualContext leaves grouping to the next stage so it can simply associate the keys with the values and be done.  For the MatrixFramework, we end up mapping and grouping at once so its GroupStageSkippingContext is a bit more complicated, but the mapAndGroupAll() method on MatrixFramework stays nice and simple.  It call map() on the Mapper for each piece of data.  As stated before, don’t overcomplicate it.  You will want to create an n-way-split-esque separation of the data into tasks, each with its own GroupStageSkippingContext, but don’t lose sight of the goal: call map on the Mapper for each piece of the data.
+
<!--
 +
credit for this assignment: Finn Voichick and Dennis Cosgrove
 +
=Motivation=
 +
Dealing with [https://en.wikipedia.org/wiki/Big_data big data] is Hansel-level hot right nowWe will build two implementations to better understand the inner workings of Hadoop and Spark-like frameworksYour frameworks will actually be more general than just MapReduce.
  
Next up, let’s talk about the Mapper’s role.  The Mapper is application specific.  By that I mean, if we were distributing a system to compete with Hadoop we would build one MatrixFramework as well as we could and ship it.  Users of our system would then write any number of custom Mappers and Reducers to solve their particular problems.
+
The Matrix implementation gives us experience with dividing the work up into thread confined tasks whose results are then combined together.
  
The Mapper is passed two parameters: a context which knows how to receive (key,value) pairs and a piece of data to process.  By “process” we mean simply figure out what (key, value) pairs it wants to emit and emit them.  We have built a number of Mappers in this course.  The WordCountMapper processes a line of text and emits pairs of (key=word, value=1) for each word in the line.  The MutualFriendsMapper emits (key=friendPair, value=account’sFriendList) for each of the passed in account’s friends.  The CardMapper emits pairs of (key=suit, value=numericValue) for each card it is passed. The CholoraMapper is passed a residence location for someone who died in the outbreak and emits a single (key, value) pair: (key=the water pump closest to that location, value=1).
+
=Background=
 +
[https://en.wikipedia.org/wiki/MapReduce wikipedia article on MapReduce]
  
As far as the Context is concerned, its role is to handle whatever key,value pairs the Mapper emits at it. For the SimpleFramework we assume that no context will be emitted the same key twice so we can simply associate the key with its value in the contained dictionary.  For the MatrixFramework it is a bit more complicated. First we have to group the emissions as they arrive (as there can be duplicates) and second we need to separate them into the correct column based on their key’s hashCode.
+
As the name suggests, MapReduce refers to the process of mapping then reducing some stream of data. At its core, all a MapReduce algorithm entails is transforming a list of one kind of item before collecting those items and reducing them down to a single values per key using some computation. As you can probably tell, the concept of MapReduce is extremely general and can apply to a wide berth of problems. In this assignment, we will use MapReduce to simulate the Facebook mutual friends algorithm (finding the mutual friends between all friend pairs) as well as pinpointing the offending well in the 1854 Soho Cholera Outbreak.
  
As far as grouping goes, hopefully writing groupAll() in the SimpleFramework will set you up well for the grouping as you receive responsibilities of GroupStageSkippingMapContext.  At a high level, looking at the signature of groupAll() tells you what you need to do.  You are passed an array of IndividualMapContexts, each containing a dictionary of key, value pairs. You must return a single Dictionary that contains all of the keys you were passed in, each associated with a list of all the values you encounter.  The array of Contexts might obscure it a little, but at a high level one can think of it as: groupAll() is passed something akin to  List<Map<K,V> and must return a single Map<K,List<V>>.
+
For more information on the general concept of MapReduce, refer to [https://stackoverflow.com/questions/28982/simple-explanation-of-mapreduce this] article.
  
=Part 1 Simple Framework=
+
=Java Advice=
==[http://www.cse.wustl.edu/~cosgroved/courses/cse231/s17/javadocs/student/solution/mapreduce/simple/StudentIndividualMapContext.html class StudentIndividualMapContext]==
+
==Parameterized Type Array Tip==
public void emit(K key, V value)
+
Creating arrays of parameterized types in Java is madness inducing.  Some details are available in [https://docs.oracle.com/javase/tutorial/java/generics/restrictions.html#createArrays Java Generics Restrictions].
  
==Word Count Application==
+
The example below creates an array of List<String>. The <code>@SuppressWarnings</code> annotation is optional.
===[http://www.cse.wustl.edu/~cosgroved/courses/cse231/s17/javadocs/student/solution/mapreduce/apps/wordcount/WordCountMapper.html class WordCountMapper]===
+
<nowiki>@SuppressWarnings("unchecked")
public void map(MapContext<String, Integer> context, String[] lineOfWords)
+
List<String>[] result = new List[length];</nowiki>
  
===[http://www.cse.wustl.edu/~cosgroved/courses/cse231/s17/javadocs/student/solution/mapreduce/apps/wordcount/WordCountReducer.html class WordCountReducer]===
+
==Use map.entrySet()==
public FinishAccumulator<Integer> createAccumulator()
+
Prefer the use of [https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#entrySet-- map.entrySet()] over map.keySet() followed by looking up the value with map.get(key).
  
===Test===
+
==Use the appropriate version of forall==
class TestWordCountApplicationWithInstructorFramework
+
there are many overloaded versions of forall including:
  
==Mutual Friends Application==
+
* [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/v5/V5.html#forall-int-int-edu.wustl.cse231s.v5.api.CheckedIntConsumer- range of ints from min to maxExclusive]
===[http://www.cse.wustl.edu/~cosgroved/courses/cse231/s17/javadocs/student/solution/mapreduce/apps/friends/MutualFriendsMapper.html class MutualFriendsMapper]===
+
* [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/v5/V5.html#forall-T:A-edu.wustl.cse231s.v5.api.CheckedConsumer- array]
public void map(MapContext<OrderedPair<AccountId>, List<AccountId>> context, Account account)
+
* [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/v5/V5.html#forall-java.lang.Iterable-edu.wustl.cse231s.v5.api.CheckedConsumer- Iterable]
  
===[http://www.cse.wustl.edu/~cosgroved/courses/cse231/s17/javadocs/student/solution/mapreduce/apps/friends/MutualFriendsReducer.html class MutualFriendsReducer]===
+
choose the correct one for each situation where "correct" is often the one that produces the cleanest code.
public FinishAccumulator<List<AccountId>> createAccumulator()
 
  
see: [[Habanero#FinishAccumulator_Creation|creating a custom reduction accumulator]] and [http://www.cse.wustl.edu/~cosgroved/courses/cse231/s17/javadocs/edu/wustl/cse231s/mapreduce/apps/friends/ListIntersectionReducer.html ListIntersectionReducer<T>]
+
=Mistakes To Avoid=
 +
{{Warning | Arrays (and Matrices) are initially filled with null. You must fill them with instances.}}
  
===Test===
+
{{Warning | Ensure that your [[Slices|Slices studio]] is conforming to spec.}}
class TestMutualFriendsApplicationWithInstructorFramework
+
 
 +
=Code To Use=
 +
To allow our frameworks to work well with JDK8 Streams, we employ a couple of standard interfaces over creating our own custom ones.
 +
 
 +
==BiConsumer==
 +
We use the standard [https://docs.oracle.com/javase/8/docs/api/java/util/function/BiConsumer.html BiConsumer<T,U>] interface with an [https://docs.oracle.com/javase/8/docs/api/java/util/function/BiConsumer.html#accept-T-U- BiConsumer's accept(t,u)] method in the place of a mythical, custom MapContext<K,V> interface with an emit(key,value) method.
 +
 
 +
<nowiki>public interface BiConsumer<T, U> {
 +
    // invoke accept with each key and value you wish to emit
 +
    void accept(T t, U u);
 +
}</nowiki>
 +
 
 +
==Collector==
 +
read [https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html the javadoc for Collector]
 +
 
 +
check out the [[Collector_MapReduce_Studio#Background]]
 +
 
 +
==IndexedRange==
 +
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/slice/core/IndexedRange.html class IndexedRange]
 +
 
 +
This class has everything you need for n-way split problems, specifically:
 +
* [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/slice/core/IndexedRange.html#getSliceIndexId() getSliceIndexId()]
 +
* [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/slice/core/IndexedRange.html#getMinInclusive() getMinInclusive()]
 +
* [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/slice/core/IndexedRange.html#getMaxExclusive() getMaxExclusive()].
 +
 
 +
==Slices==
 +
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/slice/studio/Slices.html class Slices]
 +
: [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/slice/studio/Slices.html#createNSlices(C%5B%5D,int) <nowiki>List<Slice<C[]>> createNSlices(C[] data, int numSlices)</nowiki>]
 +
 
 +
==HashUtils==
 +
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/s20/apidocs/hash/studio/HashUtils.html#toIndex(java.lang.Object,int) toIndex(key,N)]
 +
 
 +
==MultiWrapMap==
 +
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/edu/wustl/cse231s/util/MultiWrapMap.html class MultiWrapMap<K,V>]
 +
 
 +
=A Path To Victory=
 +
* [[Int_Sum_MapReduce_Apps_Studio]]
 +
* [[Collector_MapReduce_Studio]]
 +
* [[Mutual_Friends_MapReduce_Application]]
 +
* [[#Optional_Warm_Up]]
 +
* [[#Bottlenecked_MapReduce_Framework]]
 +
* [[Cholera_MapReduce_Application]]
 +
* [[#Matrix_MapReduce_Framework]]
 +
 
 +
=The Core Questions=
 +
*What are the tasks?
 +
*What is the data?
 +
*Is the data mutable?
 +
*If so, how is it shared?
 +
 
 +
=Optional Warm Up=
 +
==WordCountConcreteStaticMapReduce==
 +
{{CodeToImplement|WordCountConcreteStaticMapReduce|mapAll<br>accumulateAll<br>finishAll|mapreduce.framework.warmup.wordcount}}
 +
===Mapper===
 +
====map (Provided)====
 +
<nowiki> static void map(TextSection textSection, BiConsumer<String, Integer> keyValuePairConsumer) {
 +
mapper.map(textSection, keyValuePairConsumer);
 +
}</nowiki>
 +
 
 +
===Reducer===
 +
====createMutableContainer (Provided)====
 +
<nowiki> static List<Integer> createMutableContainer() {
 +
return collector.supplier().get();
 +
}</nowiki>
 +
 
 +
====accumulate (Provided)====
 +
<nowiki> static void accumulate(List<Integer> list, int v) {
 +
collector.accumulator().accept(list, v);
 +
}</nowiki>
 +
 
 +
====combine (Provided)====
 +
<nowiki> static List<Integer> combine(List<Integer> a, List<Integer> b) {
 +
return collector.combiner().apply(a, b);
 +
}</nowiki>
 +
 
 +
====reduce (Provided)====
 +
<nowiki> static int reduce(List<Integer> list) {
 +
return collector.finisher().apply(list);
 +
}</nowiki>
 +
 
 +
===Framework===
 +
====mapAll====
 +
mapAll can be performed in parallel.  A task should be created for each item in the input array.  Each task should accept the emitted (key, value) pairs and store them in its own List to avoid data races.  These lists make up the array which is returned (one list for each item in the input array).
 +
 
 +
<youtube>-4prus5cNFo</youtube>
 +
 
 +
{{Parallel|List<KeyValuePair<String, Integer>>[] mapAll(TextSection[] input)}}
 +
 
 +
{{Warning| When first created arrays of Objects are filled with null.  You will need to assign each array index to a new List before you start the process of adding key-value pairs}}
 +
 
 +
{{Warning| Reminder: our course libraries consistently specify max to be exclusive.  This includes the parallel forall loop.}}
 +
 
 +
{{Tip|You are encouraged to utilize the provided [[#map_.28Provided.29|map]] method.}}
 +
 
 +
[[File:Map_all.svg]]
 +
 
 +
====accumulateAll====
 +
{{Sequential|static Map<String, List<Integer>> accumulateAll(List<KeyValuePair<String, Integer>>[] mapAllResults)}}
 +
 
 +
{{Tip|You are encouraged to utilize the provided [[#createMutableContainer_.28Provided.29|createMutableContainer]] and [[#accumulate_.28Provided.29|accumulate]] methods.}}
 +
 
 +
[[File:Accumulate_all.svg]]
 +
 
 +
====finishAll====
 +
{{Parallel|static Map<String, Integer> finishAll(Map<String, List<Integer>> accumulateAllResult)}}
 +
 
 +
{{Tip|You are encouraged to utilize the provided [[#reduce_.28Provided.29|reduce]] method.}}
 +
 
 +
[[File:Finish_all.svg]]
 +
 
 +
===Testing Your Solution===
 +
====Correctness====
 +
{{TestSuite|WarmUpWordCountMapReduceTestSuite|mapreduce}}
 +
 
 +
==MutualFriendsConcreteStaticMapReduce==
 +
{{CodeToImplement|MutualFriendsConcreteStaticMapReduce|map<br>reduceCreateList<br>reduceAccumulate<br>reduceCombine<br>reduceFinish<br>mapAll<br>accumulateAll<br>finishAll|mapreduce.framework.warmup.friends}}
 +
===Mapper===
 +
====map====
 +
{{Sequential|static void map(Account account, BiConsumer<OrderedPair<AccountId>, Set<AccountId>> keyValuePairConsumer)}}
 +
 
 +
===Reducer===
 +
====createMutableContainer====
 +
{{Sequential|static List<Set<AccountId>> createMutableContainer()}}
 +
 
 +
====accumulate====
 +
{{Sequential|static void accumulate(List<Set<AccountId>> list, Set<AccountId> v)}}
 +
 
 +
====combine====
 +
{{Sequential|static void combine(List<Set<AccountId>> a, List<Set<AccountId>> b)}}
 +
 
 +
====reduce====
 +
{{Sequential|static AccountIdMutableContainer reduce(List<Set<AccountId>> list)}}
 +
 
 +
===Framework===
 +
====mapAll====
 +
{{Parallel|static List<KeyValuePair<OrderedPair<AccountId>, Set<AccountId>>>[] mapAll(Account[] input)}}
 +
 
 +
{{Warning| When first created arrays of Objects are filled with null.  You will need to assigned each array index to a new List before you start the process of adding key-value pairs}}
 +
 
 +
{{Warning| Reminder: our course libraries consistently specify max to be exclusive.  This includes the parallel forall loop.}}
 +
 
 +
{{Tip|You are encouraged to utilize the [[#map|map]] method you implemented.}}
 +
====accumulateAll====
 +
{{Sequential|static Map<OrderedPair<AccountId>, List<Set<AccountId>>> accumulateAll(List<KeyValuePair<OrderedPair<AccountId>, Set<AccountId>>>[] mapAllResults)}}
 +
 
 +
{{Tip|You are encouraged to utilize the [[#reduceCreateList|reduceCreateList]] and [[#reduceAccumulate|reduceAccumulate]] methods you implemented.}}
 +
 
 +
====finishAll====
 +
{{Parallel|static Map<OrderedPair<AccountId>, MutualFriendIds> finishAll(Map<OrderedPair<AccountId>, List<Set<AccountId>>> accumulateAllResult)}}
 +
 
 +
{{Tip|You are encouraged to utilize the [[#reduceFinish|reduceFinish]] method you implemented.}}
 +
 
 +
===Testing Your Solution===
 +
====Correctness====
 +
{{TestSuite|WarmUpMutualFriendsMapReduceTestSuite|mapreduce}}
 +
 
 +
=Required Lab=
 +
 
 +
==Bottlenecked MapReduce Framework==
 +
 
 +
{{CodeToImplement|BottleneckedMapReduceFramework|mapAll<br>accumulateAll<br>finishAll|mapreduce.framework.lab.bottlenecked}}
 +
 
 +
Navigate to the <code>BottleneckedMapReduceFramework.java</code> class and there will be three methods for you to complete: mapAll, accumulateAll, and finishAll. These frameworks are meant to be extremely general and applied to more specific uses of MapReduce.  Whereas the warm ups for this lab serve to prepare you to build this required section of the lab, this bottlenecked framework is in many ways a warm up for the matrix implementation.
  
==class WordCountConcreteStaticMapReduce (optional but encouraged)==
 
 
===mapAll===
 
===mapAll===
private static IndividualMapContext<String, Integer>[] mapAll(String[][] S, WordCountMapper mapper) throws SuspendableException
+
NOTE: If you struggle to get through this method, you are strongly encouraged to try the warm-ups.
===groupAll===
+
 
private static Map<String, List<Integer>> groupAll(IndividualMapContext<String, Integer>[] f_of_S)
+
{{Parallel|List<KeyValuePair<K, V>>[] mapAll(E[] input)}}
===reduceAll===
+
 
private static Map<String, Integer> reduceAll(Map<String, List<Integer>> grouped_f_of_S, WordCountReducer reducer) throws SuspendableException
+
{{Warning| When first created arrays of Objects are filled with null.  You will need to assigned each array index to a new List before you start the process of adding key-value pairs}}
===Test===
+
 
class TestWordCountConcreteStaticFramework
+
{{Warning| Reminder: our course libraries consistently specify max to be exclusive.  This includes the parallel forall loop.}}
 +
 
 +
With this method, you will map all of the elements of an array of data into a new array of equivalent size consisting of Lists of key value pairs.  We will leverage the Mapper which is a field/instance variable on this BottleneckedFramework instance.  When invoking the mapper's map method with an element of the input array and a BiConsumer which will  accept each key and value passed to it, adding a KeyValuePair to its List. This list should then be added to the array of lists you previously defined, therefore completing the mapping stage of MapReduce. This should all be done in parallel.
 +
 
 +
Hint: you should create an array of lists equivalent in size to the original array. Each list will contain all of the emitted (key,value) pairs for its item.
 +
 
 +
[[File:Bottlenecked map all.png|400px]] [https://docs.google.com/presentation/d/1Gtpj6_5_8imUMccpwxmfrOxKSED791OKl4lZR6G2Bm4/pub?start=false&loop=false&delayms=3000&slide=id.g7ea283730e_0_130 slide]
 +
 
 +
===accumulateAll===
 +
{{Sequential|Map<K, A> accumulateAll(List<KeyValuePair<K, V>>[] mapAllResults)}}
 +
 
 +
This middle step is often excluded in more advanced MapReduce applications. When run in parallel, it is the only step of the framework that must be completed sequentially. In the matrix framework implementation, we will do away with this step altogether for the sake of performance.
  
==class MutualFriendsConcreteStaticMapReduce (optional but encouraged)==
+
In this method, you will take in the array of lists you previously created and accumulate the key value pairs in the lists into a newly defined map.  To help deal with this issue, you must make use of the Collector provided to you. More specifically, access the accumulator in the collector by calling the <code>accumulator()</code> method and accept the key/value pair when you add it to the map. You probably noticed that the method must return a map of <K, A>, which differs from the <K, V> generics fed into the method. The framework is designed this way as the data originally fed into the mapping stage can be collected into a mutable container before reaching the finish/reduce stage. In order to access the correct value for the map if the key has no associated value yet, use the supplier associated with the Collector with the <code>supplier()</code> method.
===mapAll===
+
 
private static IndividualMapContext<OrderedPair<AccountId>, List<AccountId>>[] mapAll(Account[] S, MutualFriendsMapper mapper) throws SuspendableException
+
[[File:Bottlenecked accumulate all.png|400px]] [https://docs.google.com/presentation/d/1Gtpj6_5_8imUMccpwxmfrOxKSED791OKl4lZR6G2Bm4/pub?start=false&loop=false&delayms=3000&slide=id.g7ea283730e_0_165 slide]
 +
 
 +
===finishAll===
 +
{{Parallel|Map<K, R> finishAll(Map<K, A> accumulateAllResult)}}
 +
 
 +
This final step reduces the accumulated data and returns the final map in its reduced form. Again, you may notice that the method returns a map of <K, R> instead of the <K, A> which was returned in the accumulateAll method. This happens for the exact same reason as the accumulateAll method, as the framework is designed to handle cases in which the reduced data differs in type from the accumulated data.
 +
 
 +
To reduce the data down, use the map returned from the accumulateAll stage and put the results of the reduction into a new map. The provided Collector will come in extremely handy for this stage, more specifically the finisher which can be called using the <code>finisher()</code> method. This step should run in parallel and will probably be the easiest of the three methods.
 +
 
 +
[[File:Bottlenecked finish all.png|400px]] [https://docs.google.com/presentation/d/1Gtpj6_5_8imUMccpwxmfrOxKSED791OKl4lZR6G2Bm4/pub?start=false&loop=false&delayms=3000&slide=id.g7ea283730e_0_221 slide]
 +
 
 +
==Matrix MapReduce Framework==
 +
 
 +
 
 +
{{CodeToImplement|MatrixMapReduceFramework|mapAndAccumulateAll<br>combineAndFinishAll|mapreduce.framework.lab.matrix}}
 +
 
 +
Navigate to the <code>MatrixMapReduceFramework.java</code> class and there will be two methods for you to complete: mapAndAccumulateAll and combineAndFinishAll. These frameworks are meant to be extremely general and applied to more specific uses of MapReduce.
 +
 
 +
The matrix framework is much more complex than the bottlenecked framework, but it boosts performance by grouping the map and accumulate stages so that everything can run in parallel. It does so by slicing up the given data into the specified mapTaskCount number of slices and assigns a reduce task number to each entry using the HashUtils toIndex() method. This, in effect, creates a matrix of dictionaries, hence the name of the framework. In the combineAndFinishAll stage, the matrix comes in handy by allowing us to go directly down the columns of the matrix (as each key is essentially grouped into a bucket), combining and reducing elements all-in-one. This concept was explained in more depth during class.
 +
 
 +
===mapAndAccumulateAll===
 +
{{Parallel|Map<K, A>[][] mapAndAccumulateAll(E[] input)}}
 +
 
 +
In this stage, you will map and accumulate a given array of data into a matrix of dictionaries. This method should run in parallel while performing the map and accumulate portions of the bottlenecked framework (which we recommend you complete prior to embarking on this mission). As mentioned previously, the input should be sliced into a mapTaskCount number of IndexedRanges and then mapped/accumulated into its appropriate dictionary in the matrix. Although you could slice up the data into chunks yourself, we require using an identical algorithm as performed the <code>IndexedRange</code> and <code>Slices</code> classes introduced earlier in the course.  This will allow us to provide better feedback to allow you to pinpoint bugs sooner.  What is the best way to perform an identical algorithm to your Slices studio?  Use your Slices studio, of course.
 +
 
 +
For each slice, the mapper should map the input into its appropriate cell in the matrix and accumulate it into that specific dictionary. Essentially, you will need to nestle the actions of the accumulate method into the mapper. In order to find where the input should go in the matrix, remember that each slice keeps track of its index id and HashUtils has a toIndex method.  Which is applicable to the row and which is applicable to the column?
 +
 
 +
Hint: The number of rows should match the number of slices.
 +
 
 +
[[File:Matrix map accumulate all.png|400px]] [https://docs.google.com/presentation/d/1bSLKsI5u2e_tFc0d-RSb0BIDwA-kD75U6yXfxvpMf6Y/pub?start=false&loop=false&delayms=3000&slide=id.g7ebdc248f6_0_347 slide]
 +
 
 +
[[File:Matrix map accumulate art.png|400px]] [https://docs.google.com/presentation/d/1bSLKsI5u2e_tFc0d-RSb0BIDwA-kD75U6yXfxvpMf6Y/pub?start=false&loop=false&delayms=3000&slide=id.g7ebdc248f6_0_388 slide]
 +
 
 +
===combineAndFinishAll===
 +
{{Parallel|Map<K, R> combineAndFinishAll(Map<K, A>[][] input)}}
 +
 
 +
In this stage, you will take the matrix you just completed and combine all of the separate rows down to one array. Afterward, you will convert this combined array of maps into one final map. This method should run in parallel.
 +
 
 +
As mentioned previously, you should go directly down the matrix to access the same bucket across the different slices you created in the mapAndAccumulateAll step. For all of the maps in a column, you should go through each entry and combine it down into one row. You will need to make use of the Collector’s finisher again, but you will also need to make use of the combiner. You can access the Collector’s combiner using the <code>combiner()</code> method. Although the combine step differs from the bottlenecked framework, the finish step should mirror what you did previously.
 +
 
 +
Hint: You can use the provided MultiWrapMap class to return the final row as a valid output. You should also combine before you finish.
 +
 
 +
[[File:Matrix combine finish all.png|400px]] [https://docs.google.com/presentation/d/1bSLKsI5u2e_tFc0d-RSb0BIDwA-kD75U6yXfxvpMf6Y/pub?start=false&loop=false&delayms=3000&slide=id.g7ebdc248f6_0_425 slide]
  
[http://www.cse.wustl.edu/~cosgroved/courses/cse231/s17/hw4/mapAllIndividual.svg diagram]
+
=Testing Your Solution=
 +
==Correctness==
 +
There is a top-level test suite comprised of sub test suites which can be invoked separately when you want to focus on one part of the assignment.
 +
{{TestSuite|FrameworksLabTestSuite|mapreduce.framework.lab}}
 +
===Bottlenecked===
 +
{{TestSuite|BottleneckedFrameworkTestSuite|mapreduce.framework.lab.bottlenecked}}
 +
====MapAll====
 +
{{TestSuite|BottleneckedFrameworkTestSuite|mapreduce.framework.lab.bottlenecked}}
 +
====AccumulateAll====
 +
{{TestSuite|BottleneckedAccumulateAllTestSuite|mapreduce.framework.lab.bottlenecked}}
 +
====FinishAll====
 +
{{TestSuite|BottleneckedFinishAllTestSuite|mapreduce.framework.lab.bottlenecked}}
 +
====Holistic====
 +
{{TestSuite|BottleneckedHolisticTestSuite|mapreduce.framework.lab.bottlenecked}}
 +
===Matrix===
 +
{{TestSuite|MatrixFrameworkTestSuite|mapreduce.framework.lab.matrix}}
 +
====MapAccumulateAll====
 +
{{TestSuite|MatrixMapAccumulateAllTestSuite|mapreduce.framework.lab.matrix}}
 +
====CombineFinishAll====
 +
{{TestSuite|MatrixCombineFinishAllTestSuite|mapreduce.framework.lab.matrix}}
 +
====Holistic====
 +
{{TestSuite|MatrixHolisticTestSuite|mapreduce.framework.lab.matrix}}
  
===groupAll===
+
=Rubric=
private static Map<OrderedPair<AccountId>, List<List<AccountId>>> groupAll(IndividualMapContext<OrderedPair<AccountId>, List<AccountId>>[] f_of_S)
 
===reduceAll===
 
private static Map<OrderedPair<AccountId>, List<AccountId>> reduceAll(Map<OrderedPair<AccountId>, List<List<AccountId>>> grouped_f_of_S, MutualFriendsReducer reducer) throws SuspendableException
 
===Test===
 
class TestMutualFriendsConcreteStaticFramework
 
  
==[http://www.cse.wustl.edu/~cosgroved/courses/cse231/s17/javadocs/student/solution/mapreduce/simple/SimpleMapReduceFramework.html SimpleMapReduceFramework<E,K,V>]==
+
As always, please make sure to cite your work appropriately.
===mapAll===
 
private IndividualMapContext<K,V>[] mapAll(E[] S, Mapper<E,K,V> mapper) throws SuspendableException
 
===groupAll===
 
private Map<K, List<V>> groupAll(IndividualMapContext<K,V>[] f_of_S)
 
===reduceAll===
 
private Map<K, V> reduceAll(Map<K, List<V>> grouped_f_of_S, Reducer<V> reducer) throws SuspendableException
 
===Test===
 
class TestMutualFriendsSolution
 
  
==[http://www.cse.wustl.edu/~cosgroved/courses/cse231/s17/javadocs/edu/wustl/cse231s/mapreduce/FinishAccumulatorBasedReducer.html FinishAccumulatorBasedReducer]==
+
Total points: 100
===reduce===
 
default V reduce( List<V> list ) throws SuspendableException
 
  
=Part 2 Matrix Framework=
+
Bottlenecked framework subtotal: 40
==[http://www.cse.wustl.edu/~cosgroved/courses/cse231/s17/javadocs/student/solution/mapreduce/matrix/StudentGroupStageSkippingMapContext.html class StudentGroupStageSkippingMapContext<K,V>]==
+
* Correct mapAll (10)
===emit===
+
* Correct accumulateAll (20)
public void emit(K key, V value)
+
* Correct finishAll (10)
  
==[http://www.cse.wustl.edu/~cosgroved/courses/cse231/s17/javadocs/student/solution/mapreduce/matrix/MatrixMapReduceFramework.html class MatrixMapReduceFramework<E,K,V>]==
+
Matrix framework subtotal: 60
===mapGroupAll===
+
* Correct mapAndAccumulateAll (30)
private GroupStageSkippingMapContext<K, V>[] mapGroupAll(E[] S, Mapper<E, K, V> mapper) throws SuspendableException
+
* Correct combineAndFinishAll (30)
===reduceAll===
+
-->
private Map<K,V> reduceAll(GroupStageSkippingMapContext<K, V>[] grouped_f_of_S, Reducer<V> reducer) throws SuspendableException
 

Latest revision as of 08:23, 3 March 2022