Difference between revisions of "MapReduce Frameworks Lab"

From CSE231 Wiki
Jump to navigation Jump to search
Line 82: Line 82:
 
* [[#Optional_Warm_Up]]
 
* [[#Optional_Warm_Up]]
 
* [[#Bottlenecked_MapReduce_Framework]]
 
* [[#Bottlenecked_MapReduce_Framework]]
<!--
 
* Wait For Thursday's Class Session (If Applicable)
 
-->
 
 
* [[Cholera_MapReduce_Application]]
 
* [[Cholera_MapReduce_Application]]
 
* [[#Matrix_MapReduce_Framework]]
 
* [[#Matrix_MapReduce_Framework]]

Revision as of 08:23, 3 March 2022

Bottleneck MapReduce Framework

Matrix MapReduce Framework


Bottlenecked accumulate all.png slide

finishAll

method: Map<K, R> finishAll(Map<K, A> accumulateAllResult) Parallel.svg (parallel implementation required)

This final step reduces the accumulated data and returns the final map in its reduced form. Again, you may notice that the method returns a map of <K, R> instead of the <K, A> which was returned in the accumulateAll method. This happens for the exact same reason as the accumulateAll method, as the framework is designed to handle cases in which the reduced data differs in type from the accumulated data.

To reduce the data down, use the map returned from the accumulateAll stage and put the results of the reduction into a new map. The provided Collector will come in extremely handy for this stage, more specifically the finisher which can be called using the finisher() method. This step should run in parallel and will probably be the easiest of the three methods.

Bottlenecked finish all.png slide

Matrix MapReduce Framework

class: MatrixMapReduceFramework.java Java.png
methods: mapAndAccumulateAll
combineAndFinishAll
package: mapreduce.framework.lab.matrix
source folder: student/src/main/java

Navigate to the MatrixMapReduceFramework.java class and there will be two methods for you to complete: mapAndAccumulateAll and combineAndFinishAll. These frameworks are meant to be extremely general and applied to more specific uses of MapReduce.

The matrix framework is much more complex than the bottlenecked framework, but it boosts performance by grouping the map and accumulate stages so that everything can run in parallel. It does so by slicing up the given data into the specified mapTaskCount number of slices and assigns a reduce task number to each entry using the HashUtils toIndex() method. This, in effect, creates a matrix of dictionaries, hence the name of the framework. In the combineAndFinishAll stage, the matrix comes in handy by allowing us to go directly down the columns of the matrix (as each key is essentially grouped into a bucket), combining and reducing elements all-in-one. This concept was explained in more depth during class.

mapAndAccumulateAll

method: Map<K, A>[][] mapAndAccumulateAll(E[] input) Parallel.svg (parallel implementation required)

In this stage, you will map and accumulate a given array of data into a matrix of dictionaries. This method should run in parallel while performing the map and accumulate portions of the bottlenecked framework (which we recommend you complete prior to embarking on this mission). As mentioned previously, the input should be sliced into a mapTaskCount number of IndexedRanges and then mapped/accumulated into its appropriate dictionary in the matrix. Although you could slice up the data into chunks yourself, we require using an identical algorithm as performed the IndexedRange and Slices classes introduced earlier in the course. This will allow us to provide better feedback to allow you to pinpoint bugs sooner. What is the best way to perform an identical algorithm to your Slices studio? Use your Slices studio, of course.

For each slice, the mapper should map the input into its appropriate cell in the matrix and accumulate it into that specific dictionary. Essentially, you will need to nestle the actions of the accumulate method into the mapper. In order to find where the input should go in the matrix, remember that each slice keeps track of its index id and HashUtils has a toIndex method. Which is applicable to the row and which is applicable to the column?

Hint: The number of rows should match the number of slices.

Matrix map accumulate all.png slide

Matrix map accumulate art.png slide

combineAndFinishAll

method: Map<K, R> combineAndFinishAll(Map<K, A>[][] input) Parallel.svg (parallel implementation required)

In this stage, you will take the matrix you just completed and combine all of the separate rows down to one array. Afterward, you will convert this combined array of maps into one final map. This method should run in parallel.

As mentioned previously, you should go directly down the matrix to access the same bucket across the different slices you created in the mapAndAccumulateAll step. For all of the maps in a column, you should go through each entry and combine it down into one row. You will need to make use of the Collector’s finisher again, but you will also need to make use of the combiner. You can access the Collector’s combiner using the combiner() method. Although the combine step differs from the bottlenecked framework, the finish step should mirror what you did previously.

Hint: You can use the provided MultiWrapMap class to return the final row as a valid output. You should also combine before you finish.

Matrix combine finish all.png slide

Testing Your Solution

Correctness

There is a top-level test suite comprised of sub test suites which can be invoked separately when you want to focus on one part of the assignment.

class: FrameworksLabTestSuite.java Junit.png
package: mapreduce.framework.lab
source folder: testing/src/test/java

Bottlenecked

class: BottleneckedFrameworkTestSuite.java Junit.png
package: mapreduce.framework.lab.bottlenecked
source folder: testing/src/test/java

MapAll

class: BottleneckedFrameworkTestSuite.java Junit.png
package: mapreduce.framework.lab.bottlenecked
source folder: testing/src/test/java

AccumulateAll

class: BottleneckedAccumulateAllTestSuite.java Junit.png
package: mapreduce.framework.lab.bottlenecked
source folder: testing/src/test/java

FinishAll

class: BottleneckedFinishAllTestSuite.java Junit.png
package: mapreduce.framework.lab.bottlenecked
source folder: testing/src/test/java

Holistic

class: BottleneckedHolisticTestSuite.java Junit.png
package: mapreduce.framework.lab.bottlenecked
source folder: testing/src/test/java

Matrix

class: MatrixFrameworkTestSuite.java Junit.png
package: mapreduce.framework.lab.matrix
source folder: testing/src/test/java

MapAccumulateAll

class: MatrixMapAccumulateAllTestSuite.java Junit.png
package: mapreduce.framework.lab.matrix
source folder: testing/src/test/java

CombineFinishAll

class: MatrixCombineFinishAllTestSuite.java Junit.png
package: mapreduce.framework.lab.matrix
source folder: testing/src/test/java

Holistic

class: MatrixHolisticTestSuite.java Junit.png
package: mapreduce.framework.lab.matrix
source folder: testing/src/test/java

Rubric

As always, please make sure to cite your work appropriately.

Total points: 100

Bottlenecked framework subtotal: 40

  • Correct mapAll (10)
  • Correct accumulateAll (20)
  • Correct finishAll (10)

Matrix framework subtotal: 60

  • Correct mapAndAccumulateAll (30)
  • Correct combineAndFinishAll (30)

-->