Difference between revisions of "MapReduce Frameworks Lab"
Jump to navigation
Jump to search
(38 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | [[Bottleneck_MapReduce_Framework_Assignment|Bottleneck MapReduce Framework]] | ||
+ | |||
+ | [[Matrix_MapReduce_Framework_Assignment|Matrix MapReduce Framework]] | ||
+ | |||
+ | <!-- | ||
credit for this assignment: Finn Voichick and Dennis Cosgrove | credit for this assignment: Finn Voichick and Dennis Cosgrove | ||
=Motivation= | =Motivation= | ||
Line 38: | Line 43: | ||
=Code To Use= | =Code To Use= | ||
− | To allow our frameworks to work well with JDK8 Streams, we employ a couple of standard interfaces over creating | + | To allow our frameworks to work well with JDK8 Streams, we employ a couple of standard interfaces over creating our own custom ones. |
==BiConsumer== | ==BiConsumer== | ||
Line 77: | Line 82: | ||
* [[#Optional_Warm_Up]] | * [[#Optional_Warm_Up]] | ||
* [[#Bottlenecked_MapReduce_Framework]] | * [[#Bottlenecked_MapReduce_Framework]] | ||
− | |||
− | |||
− | |||
* [[Cholera_MapReduce_Application]] | * [[Cholera_MapReduce_Application]] | ||
* [[#Matrix_MapReduce_Framework]] | * [[#Matrix_MapReduce_Framework]] | ||
Line 92: | Line 94: | ||
==WordCountConcreteStaticMapReduce== | ==WordCountConcreteStaticMapReduce== | ||
{{CodeToImplement|WordCountConcreteStaticMapReduce|mapAll<br>accumulateAll<br>finishAll|mapreduce.framework.warmup.wordcount}} | {{CodeToImplement|WordCountConcreteStaticMapReduce|mapAll<br>accumulateAll<br>finishAll|mapreduce.framework.warmup.wordcount}} | ||
+ | ===Mapper=== | ||
+ | ====map (Provided)==== | ||
+ | <nowiki> static void map(TextSection textSection, BiConsumer<String, Integer> keyValuePairConsumer) { | ||
+ | mapper.map(textSection, keyValuePairConsumer); | ||
+ | }</nowiki> | ||
+ | |||
+ | ===Reducer=== | ||
+ | ====createMutableContainer (Provided)==== | ||
+ | <nowiki> static List<Integer> createMutableContainer() { | ||
+ | return collector.supplier().get(); | ||
+ | }</nowiki> | ||
+ | |||
+ | ====accumulate (Provided)==== | ||
+ | <nowiki> static void accumulate(List<Integer> list, int v) { | ||
+ | collector.accumulator().accept(list, v); | ||
+ | }</nowiki> | ||
+ | |||
+ | ====combine (Provided)==== | ||
+ | <nowiki> static List<Integer> combine(List<Integer> a, List<Integer> b) { | ||
+ | return collector.combiner().apply(a, b); | ||
+ | }</nowiki> | ||
+ | |||
+ | ====reduce (Provided)==== | ||
+ | <nowiki> static int reduce(List<Integer> list) { | ||
+ | return collector.finisher().apply(list); | ||
+ | }</nowiki> | ||
+ | |||
===Framework=== | ===Framework=== | ||
====mapAll==== | ====mapAll==== | ||
+ | mapAll can be performed in parallel. A task should be created for each item in the input array. Each task should accept the emitted (key, value) pairs and store them in its own List to avoid data races. These lists make up the array which is returned (one list for each item in the input array). | ||
+ | |||
+ | <youtube>-4prus5cNFo</youtube> | ||
+ | |||
{{Parallel|List<KeyValuePair<String, Integer>>[] mapAll(TextSection[] input)}} | {{Parallel|List<KeyValuePair<String, Integer>>[] mapAll(TextSection[] input)}} | ||
− | {{Warning| When first created arrays of Objects are filled with null. You will need to | + | {{Warning| When first created arrays of Objects are filled with null. You will need to assign each array index to a new List before you start the process of adding key-value pairs}} |
+ | |||
+ | {{Warning| Reminder: our course libraries consistently specify max to be exclusive. This includes the parallel forall loop.}} | ||
+ | |||
+ | {{Tip|You are encouraged to utilize the provided [[#map_.28Provided.29|map]] method.}} | ||
+ | |||
+ | [[File:Map_all.svg]] | ||
====accumulateAll==== | ====accumulateAll==== | ||
{{Sequential|static Map<String, List<Integer>> accumulateAll(List<KeyValuePair<String, Integer>>[] mapAllResults)}} | {{Sequential|static Map<String, List<Integer>> accumulateAll(List<KeyValuePair<String, Integer>>[] mapAllResults)}} | ||
+ | |||
+ | {{Tip|You are encouraged to utilize the provided [[#createMutableContainer_.28Provided.29|createMutableContainer]] and [[#accumulate_.28Provided.29|accumulate]] methods.}} | ||
+ | |||
+ | [[File:Accumulate_all.svg]] | ||
====finishAll==== | ====finishAll==== | ||
{{Parallel|static Map<String, Integer> finishAll(Map<String, List<Integer>> accumulateAllResult)}} | {{Parallel|static Map<String, Integer> finishAll(Map<String, List<Integer>> accumulateAllResult)}} | ||
+ | |||
+ | {{Tip|You are encouraged to utilize the provided [[#reduce_.28Provided.29|reduce]] method.}} | ||
+ | |||
+ | [[File:Finish_all.svg]] | ||
===Testing Your Solution=== | ===Testing Your Solution=== | ||
Line 111: | Line 158: | ||
{{CodeToImplement|MutualFriendsConcreteStaticMapReduce|map<br>reduceCreateList<br>reduceAccumulate<br>reduceCombine<br>reduceFinish<br>mapAll<br>accumulateAll<br>finishAll|mapreduce.framework.warmup.friends}} | {{CodeToImplement|MutualFriendsConcreteStaticMapReduce|map<br>reduceCreateList<br>reduceAccumulate<br>reduceCombine<br>reduceFinish<br>mapAll<br>accumulateAll<br>finishAll|mapreduce.framework.warmup.friends}} | ||
===Mapper=== | ===Mapper=== | ||
+ | ====map==== | ||
{{Sequential|static void map(Account account, BiConsumer<OrderedPair<AccountId>, Set<AccountId>> keyValuePairConsumer)}} | {{Sequential|static void map(Account account, BiConsumer<OrderedPair<AccountId>, Set<AccountId>> keyValuePairConsumer)}} | ||
===Reducer=== | ===Reducer=== | ||
− | {{Sequential|static List<Set<AccountId>> | + | ====createMutableContainer==== |
+ | {{Sequential|static List<Set<AccountId>> createMutableContainer()}} | ||
− | {{Sequential|static void | + | ====accumulate==== |
+ | {{Sequential|static void accumulate(List<Set<AccountId>> list, Set<AccountId> v)}} | ||
− | {{Sequential|static void | + | ====combine==== |
+ | {{Sequential|static void combine(List<Set<AccountId>> a, List<Set<AccountId>> b)}} | ||
+ | |||
+ | ====reduce==== | ||
+ | {{Sequential|static AccountIdMutableContainer reduce(List<Set<AccountId>> list)}} | ||
− | |||
===Framework=== | ===Framework=== | ||
+ | ====mapAll==== | ||
{{Parallel|static List<KeyValuePair<OrderedPair<AccountId>, Set<AccountId>>>[] mapAll(Account[] input)}} | {{Parallel|static List<KeyValuePair<OrderedPair<AccountId>, Set<AccountId>>>[] mapAll(Account[] input)}} | ||
+ | {{Warning| When first created arrays of Objects are filled with null. You will need to assigned each array index to a new List before you start the process of adding key-value pairs}} | ||
+ | |||
+ | {{Warning| Reminder: our course libraries consistently specify max to be exclusive. This includes the parallel forall loop.}} | ||
+ | |||
+ | {{Tip|You are encouraged to utilize the [[#map|map]] method you implemented.}} | ||
+ | ====accumulateAll==== | ||
{{Sequential|static Map<OrderedPair<AccountId>, List<Set<AccountId>>> accumulateAll(List<KeyValuePair<OrderedPair<AccountId>, Set<AccountId>>>[] mapAllResults)}} | {{Sequential|static Map<OrderedPair<AccountId>, List<Set<AccountId>>> accumulateAll(List<KeyValuePair<OrderedPair<AccountId>, Set<AccountId>>>[] mapAllResults)}} | ||
+ | {{Tip|You are encouraged to utilize the [[#reduceCreateList|reduceCreateList]] and [[#reduceAccumulate|reduceAccumulate]] methods you implemented.}} | ||
+ | |||
+ | ====finishAll==== | ||
{{Parallel|static Map<OrderedPair<AccountId>, MutualFriendIds> finishAll(Map<OrderedPair<AccountId>, List<Set<AccountId>>> accumulateAllResult)}} | {{Parallel|static Map<OrderedPair<AccountId>, MutualFriendIds> finishAll(Map<OrderedPair<AccountId>, List<Set<AccountId>>> accumulateAllResult)}} | ||
+ | |||
+ | {{Tip|You are encouraged to utilize the [[#reduceFinish|reduceFinish]] method you implemented.}} | ||
===Testing Your Solution=== | ===Testing Your Solution=== | ||
Line 141: | Line 206: | ||
===mapAll=== | ===mapAll=== | ||
+ | NOTE: If you struggle to get through this method, you are strongly encouraged to try the warm-ups. | ||
+ | |||
{{Parallel|List<KeyValuePair<K, V>>[] mapAll(E[] input)}} | {{Parallel|List<KeyValuePair<K, V>>[] mapAll(E[] input)}} | ||
− | With this method, you will map all of the elements of an array of data into a new array of equivalent size consisting of | + | {{Warning| When first created arrays of Objects are filled with null. You will need to assigned each array index to a new List before you start the process of adding key-value pairs}} |
+ | |||
+ | {{Warning| Reminder: our course libraries consistently specify max to be exclusive. This includes the parallel forall loop.}} | ||
+ | |||
+ | With this method, you will map all of the elements of an array of data into a new array of equivalent size consisting of Lists of key value pairs. We will leverage the Mapper which is a field/instance variable on this BottleneckedFramework instance. When invoking the mapper's map method with an element of the input array and a BiConsumer which will accept each key and value passed to it, adding a KeyValuePair to its List. This list should then be added to the array of lists you previously defined, therefore completing the mapping stage of MapReduce. This should all be done in parallel. | ||
Hint: you should create an array of lists equivalent in size to the original array. Each list will contain all of the emitted (key,value) pairs for its item. | Hint: you should create an array of lists equivalent in size to the original array. Each list will contain all of the emitted (key,value) pairs for its item. | ||
Line 155: | Line 226: | ||
In this method, you will take in the array of lists you previously created and accumulate the key value pairs in the lists into a newly defined map. To help deal with this issue, you must make use of the Collector provided to you. More specifically, access the accumulator in the collector by calling the <code>accumulator()</code> method and accept the key/value pair when you add it to the map. You probably noticed that the method must return a map of <K, A>, which differs from the <K, V> generics fed into the method. The framework is designed this way as the data originally fed into the mapping stage can be collected into a mutable container before reaching the finish/reduce stage. In order to access the correct value for the map if the key has no associated value yet, use the supplier associated with the Collector with the <code>supplier()</code> method. | In this method, you will take in the array of lists you previously created and accumulate the key value pairs in the lists into a newly defined map. To help deal with this issue, you must make use of the Collector provided to you. More specifically, access the accumulator in the collector by calling the <code>accumulator()</code> method and accept the key/value pair when you add it to the map. You probably noticed that the method must return a map of <K, A>, which differs from the <K, V> generics fed into the method. The framework is designed this way as the data originally fed into the mapping stage can be collected into a mutable container before reaching the finish/reduce stage. In order to access the correct value for the map if the key has no associated value yet, use the supplier associated with the Collector with the <code>supplier()</code> method. | ||
− | |||
− | |||
− | |||
− | |||
[[File:Bottlenecked accumulate all.png|400px]] [https://docs.google.com/presentation/d/1Gtpj6_5_8imUMccpwxmfrOxKSED791OKl4lZR6G2Bm4/pub?start=false&loop=false&delayms=3000&slide=id.g7ea283730e_0_165 slide] | [[File:Bottlenecked accumulate all.png|400px]] [https://docs.google.com/presentation/d/1Gtpj6_5_8imUMccpwxmfrOxKSED791OKl4lZR6G2Bm4/pub?start=false&loop=false&delayms=3000&slide=id.g7ea283730e_0_165 slide] | ||
Line 192: | Line 259: | ||
[[File:Matrix map accumulate art.png|400px]] [https://docs.google.com/presentation/d/1bSLKsI5u2e_tFc0d-RSb0BIDwA-kD75U6yXfxvpMf6Y/pub?start=false&loop=false&delayms=3000&slide=id.g7ebdc248f6_0_388 slide] | [[File:Matrix map accumulate art.png|400px]] [https://docs.google.com/presentation/d/1bSLKsI5u2e_tFc0d-RSb0BIDwA-kD75U6yXfxvpMf6Y/pub?start=false&loop=false&delayms=3000&slide=id.g7ebdc248f6_0_388 slide] | ||
− | |||
===combineAndFinishAll=== | ===combineAndFinishAll=== | ||
Line 242: | Line 308: | ||
* Correct mapAndAccumulateAll (30) | * Correct mapAndAccumulateAll (30) | ||
* Correct combineAndFinishAll (30) | * Correct combineAndFinishAll (30) | ||
+ | --> |