Difference between revisions of "MapReduce Reducer Assignment"

From CSE231 Wiki
Jump to navigation Jump to search
 
(38 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
credit for this assignment: Finn Voichick and Dennis Cosgrove
 
=Motivation=
 
=Motivation=
<code>interface Collector<T,A,R></code> is fundamental to the MapReduce Frameworks lab.  Your frameworks will be general enough such that MapReduce is just a subset of what you will support.
+
{{AccumulatorCombinerReducerLink}} is fundamental to the MapReduce assignments, providing four of the five methods which allow clients to well use {{MapReduceFrameworkLink}}.
  
In this studio we will build a ClassicReducer Collector which will implement the MapReduce style of accumulating all of the emitted values per key in a List.
+
First, we will implement {{ClassicReducerLink}} which will provide the basic MapReduce functionality used by the [https://en.wikipedia.org/wiki/PageRank PageRank] algorithm which made [https://www.google.com/ Google]'s original search engine so effective.
  
We will also build a custom IntSumCollector to demonstrate the desired flexibility of the Collector interface.
+
Then, we will implement two AccumulatorCombinerReducers which can serve to sum mutable containers of Integers.  Thus, they will pair well with the previous [[#MapReduce_Mapper_Assignment|Mappers exercise]].
 +
* {{SummingIntClassicReducerLink}} will extend ClassicReducer. 
 +
* {{SummingIntEfficientAccumulatorCombinerReducerLink}} will use a far more efficient mutable container to get the job done.
  
 
=Background=
 
=Background=
We debated between creating our own custom Reducer<V,A,R> interface versus adopting the standard [https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html interface Collector<T,A,R>] from the standard Java streams framework as the basis for the MapReduce Frameworks Lab.   
+
The [http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html Java streams framework] uses {{CollectorLink}} to allow clients to customize the four methods to create, accumulate, combine, and reduce.  {{CollectorLink}} chose to add an extra level of indirection to allow a hyper-lambdafication style.  While there is nothing fundamentally wrong with this approach, it was eventually deemed more trouble than it was worth for CSE 231sTo learn more about Collector and its connection to AccumulatorCombinerReducer, check out this [[#Collector_Rosetta_Stone|Rosetta Stone]].
  
While It might have been slightly less confusing at the outset if we used something like this mythical non-existant interface below:
+
={{AccumulatorCombinerReducerLink}}=
 +
{{CollapsibleCode|AccumulatorCombinerReducer<V, A, R>|
 +
<syntaxhighlight lang="java">
 +
public interface AccumulatorCombinerReducer<V, A, R> {
 +
A createMutableContainer();
  
<nowiki>public interface Reducer<V, A, R> {
 
A createMutableContainer();
 
 
void accumulate(A container, V item);
 
void accumulate(A container, V item);
A combine(A containerA, A containerB);
+
 
 +
void combine(A containerA, A containerB);
 +
 
 
R reduce(A container);
 
R reduce(A container);
}</nowiki>
+
}
 +
</syntaxhighlight>}}
 +
==createMutableContainer()==
 +
We use createMutableContainer() to create a new mutable container.  For classic map reduce this would be a [https://docs.oracle.com/javase/8/docs/api/java/util/List.html List<V>].
  
we chose to go with the standard Collector<T,A,R> despite the extra level of indirection it requires (returning @Functional interfaces whose single abstract methods do the work).
+
==accumulate(container,item)==
 +
We use accumulate(container,item) to accumulate a value.  For classic map reduce this would add an item to a list.
  
We provide this mythical <code>class CollectorReducerAdapter</code> as a [https://en.wikipedia.org/wiki/Rosetta_Stone Rosetta Stone] of sorts in an effort to reveal what each method is responsible for:
+
==combine(containerA,containerB)==
 +
We use combine(containerA,containerB) to combine two accumulated mutable containers. You must combine containerB into containerA.  
  
<nowiki>public class CollectorReducerAdapter<V, A, R> implements Reducer<V, A, R> {
+
==reduce(container)==
private final Collector<V,A,R> collector;
+
We use reduce(container) to produce the distilled value of the container.
public CollectorReducerAdapter(Collector<V,A,R> collector) {
 
this.collector = collector;
 
}
 
@Override
 
public A createMutableContainer() {
 
return collector.supplier().get();
 
}
 
@Override
 
public void accumulate(A container, V item) {
 
collector.accumulator().accept(container, item);
 
}
 
@Override
 
public A combine(A containerA, A containerB) {
 
return collector.combiner().apply(containerA, containerB);
 
}
 
@Override
 
public R reduce(A container) {
 
return collector.finisher().apply(container);
 
}
 
}</nowiki>
 
  
 
=Code To Use=
 
=Code To Use=
[https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html interface Collector<T,A,R>]
+
{{ListDoc}}
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/Supplier.html interface Supplier<T>]
+
: {{ArrayListDoc}}
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/BiConsumer.html interface BiConsumer<T,U>]
+
: {{LinkedListDoc}}
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/BinaryOperator.html interface BinaryOperator<T>]
 
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/Function.html interface Function<T,R>]
 
  
 
[https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/mutable/MutableInt.html class MutableInt]
 
[https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/mutable/MutableInt.html class MutableInt]
Line 56: Line 46:
 
=Code To Implement=
 
=Code To Implement=
 
==ClassicReducer==
 
==ClassicReducer==
The classic MapReduce Collector will collect all of the emitted values in a [https://docs.oracle.com/javase/8/docs/api/java/util/List.html List].
+
The classic MapReduce approach always uses a List of <code>V</code>s as its mutable container.  This abstract class will prove useful in this exercise and in future ones.
 +
 
 +
{{CodeToImplement|ClassicReducer|createMutableContainer<br>accumulate<br>combine|mapreduce.apps.reducer.classic.exercise}}
 +
 
 +
===createMutableContainer===
 +
{{Sequential|List<V> createMutableContainer()}}
 +
 
 +
Construct a new instance of a List.  [https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html ArrayList] and [https://docs.oracle.com/javase/8/docs/api/java/util/LinkedList.html LinkedList] are two classes each of which implement the interface [https://docs.oracle.com/javase/8/docs/api/java/util/List.html List].
 +
 
 +
===accumulate===
 +
{{Sequential|void accumulate(List<V> container, V item)}}
 +
 
 +
Mutate the specified <code>container</code> by [https://docs.oracle.com/javase/8/docs/api/java/util/List.html#add-E- adding] the specified <code>item</code>.
 +
 
 +
===combine===
 +
{{Sequential|void combine(List<V> containerA, List<V> containerB)}}
 +
 
 +
Mutate the specified <code>containerA</code> by [https://docs.oracle.com/javase/8/docs/api/java/util/List.html#addAll-java.util.Collection- adding all] of the contents of the specified <code>containerB</code>.  Do NOT mutated containerB.
  
{{CodeToImplement|ClassicReducer|supplier<br>accumulator<br>combiner|mapreduce.collector.studio}}
+
==Classic Summing Int==
 +
{{CodeToImplement|SummingIntClassicReducer|reduce|mapreduce.apps.reducer.summingint.classic.exercise}}
  
{{Sequential|Supplier<List<V>> supplier()}}
+
{{Sequential|public Integer reduce(List<Integer> container)}}
  
{{Sequential|BiConsumer<List<V>, V> accumulator()}}
+
The reduce method is passed a list of integers which it should simply sum up and return.
  
{{Sequential|BinaryOperator<List<V>> combiner()}}
+
==Efficient Summing Int==
 +
MapReduce Apps like Word Count offer glaring opportunities to optimize the classic MapReduce append all of the 1s in a List and add them up later.  SummingIntEfficientAccumulatorCombinerReducer will use [https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/mutable/MutableInt.html MutableInt] to simply add the values as they come in.
  
==IntSumCollector==
+
{{CodeToImplement|SummingIntEfficientAccumulatorCombinerReducer|createMutableContainer<br>accumulate<br>combine<br>reduce|mapreduce.apps.reducer.summingint.efficient.exercise}}
{{CodeToImplement|IntSumCollector|supplier<br>accumulator<br>combiner<br>finisher|mapreduce.collector.intsum.studio}}
 
  
{{Sequential|public Supplier<MutableInt> supplier()}}
+
{{Sequential|MutableInt createMutableContainer()}}
  
{{Sequential|public BiConsumer<MutableInt, Integer> accumulator()}}
+
{{Sequential|void accumulate(MutableInt container, Integer item)}}
  
{{Sequential|public BinaryOperator<MutableInt> combiner()}}
+
{{Sequential|void combine(MutableInt containerA, MutableInt containerB)}}
  
{{Sequential|public Function<MutableInt, Integer> finisher()}}
+
{{Sequential|reduce(MutableInt container)}}
  
 
=Testing Your Solution=
 
=Testing Your Solution=
==Correctness==
+
'''Note:''' in order to effectively assess this assignment, the testing leverages the WordCountMapper.  Be sure to correctly complete that assignment first.
===top level===
 
{{TestSuite|CollectorStudioTestSuite|mapreduce}}
 
===sub===
 
{{TestSuite|ClassicReducerTestSuite|mapreduce}}
 
  
{{TestSuite|IntSumCollectorTestSuite|mapreduce}}
+
{{TestSuite|_AccumulatorCombinerReducerTestSuite|mapreduce.apps}}

Latest revision as of 21:16, 1 April 2024

credit for this assignment: Finn Voichick and Dennis Cosgrove

Motivation

interface AccumulatorCombinerReducer<V,A,R> is fundamental to the MapReduce assignments, providing four of the five methods which allow clients to well use interface MapReduceFramework<E,K,V,A,R>.

First, we will implement abstract class ClassicReducer<V,R> which will provide the basic MapReduce functionality used by the PageRank algorithm which made Google's original search engine so effective.

Then, we will implement two AccumulatorCombinerReducers which can serve to sum mutable containers of Integers. Thus, they will pair well with the previous Mappers exercise.

Background

The Java streams framework uses interface Collector<T,A,R> to allow clients to customize the four methods to create, accumulate, combine, and reduce. interface Collector<T,A,R> chose to add an extra level of indirection to allow a hyper-lambdafication style. While there is nothing fundamentally wrong with this approach, it was eventually deemed more trouble than it was worth for CSE 231s. To learn more about Collector and its connection to AccumulatorCombinerReducer, check out this Rosetta Stone.

interface AccumulatorCombinerReducer<V,A,R>

AccumulatorCombinerReducer<V, A, R>  
public interface AccumulatorCombinerReducer<V, A, R> {
	A createMutableContainer();

	void accumulate(A container, V item);

	void combine(A containerA, A containerB);

	R reduce(A container);
}

createMutableContainer()

We use createMutableContainer() to create a new mutable container. For classic map reduce this would be a List<V>.

accumulate(container,item)

We use accumulate(container,item) to accumulate a value. For classic map reduce this would add an item to a list.

combine(containerA,containerB)

We use combine(containerA,containerB) to combine two accumulated mutable containers. You must combine containerB into containerA.

reduce(container)

We use reduce(container) to produce the distilled value of the container.

Code To Use

List<E>

class ArrayList<E>
class LinkedList<E>

class MutableInt

Code To Implement

ClassicReducer

The classic MapReduce approach always uses a List of Vs as its mutable container. This abstract class will prove useful in this exercise and in future ones.

class: ClassicReducer.java Java.png
methods: createMutableContainer
accumulate
combine
package: mapreduce.apps.reducer.classic.exercise
source folder: student/src/main/java

createMutableContainer

method: List<V> createMutableContainer() Sequential.svg (sequential implementation only)

Construct a new instance of a List. ArrayList and LinkedList are two classes each of which implement the interface List.

accumulate

method: void accumulate(List<V> container, V item) Sequential.svg (sequential implementation only)

Mutate the specified container by adding the specified item.

combine

method: void combine(List<V> containerA, List<V> containerB) Sequential.svg (sequential implementation only)

Mutate the specified containerA by adding all of the contents of the specified containerB. Do NOT mutated containerB.

Classic Summing Int

class: SummingIntClassicReducer.java Java.png
methods: reduce
package: mapreduce.apps.reducer.summingint.classic.exercise
source folder: student/src/main/java

method: public Integer reduce(List<Integer> container) Sequential.svg (sequential implementation only)

The reduce method is passed a list of integers which it should simply sum up and return.

Efficient Summing Int

MapReduce Apps like Word Count offer glaring opportunities to optimize the classic MapReduce append all of the 1s in a List and add them up later. SummingIntEfficientAccumulatorCombinerReducer will use MutableInt to simply add the values as they come in.

class: SummingIntEfficientAccumulatorCombinerReducer.java Java.png
methods: createMutableContainer
accumulate
combine
reduce
package: mapreduce.apps.reducer.summingint.efficient.exercise
source folder: student/src/main/java

method: MutableInt createMutableContainer() Sequential.svg (sequential implementation only)

method: void accumulate(MutableInt container, Integer item) Sequential.svg (sequential implementation only)

method: void combine(MutableInt containerA, MutableInt containerB) Sequential.svg (sequential implementation only)

method: reduce(MutableInt container) Sequential.svg (sequential implementation only)

Testing Your Solution

Note: in order to effectively assess this assignment, the testing leverages the WordCountMapper. Be sure to correctly complete that assignment first.

class: _AccumulatorCombinerReducerTestSuite.java Junit.png
package: mapreduce.apps
source folder: testing/src/test/java