Difference between revisions of "MapReduce Reducer Assignment"

From CSE231 Wiki
Jump to navigation Jump to search
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
credit for this assignment: Finn Voichick and Dennis Cosgrove
 
credit for this assignment: Finn Voichick and Dennis Cosgrove
 
=Motivation=
 
=Motivation=
<code>interface Reducer<T,A,R></code> is fundamental to the MapReduce Frameworks lab.  Your frameworks will be general enough such that MapReduce is just a subset of what you will support.
+
{{AccumulatorCombinerReducerLink}} is fundamental to the MapReduce assignments, providing four of the five methods which allow clients to well use {{MapReduceFrameworkLink}}.
  
In this studio we will build a ListAccumulatingReducer which will implement the MapReduce style of accumulating all of the emitted values per key in a List.
+
First, we will implement {{ClassicReducerLink}} which will provide the basic MapReduce functionality used by the [https://en.wikipedia.org/wiki/PageRank PageRank] algorithm which made [https://www.google.com/ Google]'s original search engine so effective.
  
We will also build a custom IntSumEfficientReducer to demonstrate the desired flexibility of the Reducer interface.
+
Then, we will implement two AccumulatorCombinerReducers which can serve to sum mutable containers of Integers.  Thus, they will pair well with the previous [[#MapReduce_Mapper_Assignment|Mappers exercise]].
 +
* {{SummingIntClassicReducerLink}} will extend ClassicReducer. 
 +
* {{SummingIntEfficientAccumulatorCombinerReducerLink}} will use a far more efficient mutable container to get the job done.
  
 
=Background=
 
=Background=
We have chosen to use a non-lambdafied Reducer<V,A,R> interface versus adopting the standard [https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html interface Collector<T,A,R>] from the standard [http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html Java streams framework] as the basis for the MapReduce Frameworks Lab.   
+
The [http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html Java streams framework] uses {{CollectorLink}} to allow clients to customize the four methods to create, accumulate, combine, and reduce.  {{CollectorLink}} chose to add an extra level of indirection to allow a hyper-lambdafication style.  While there is nothing fundamentally wrong with this approach, it was eventually deemed more trouble than it was worth for CSE 231sTo learn more about Collector and its connection to AccumulatorCombinerReducer, check out this [[#Collector_Rosetta_Stone|Rosetta Stone]].
  
== CSE 231 Selection: Reducer ==
+
={{AccumulatorCombinerReducerLink}}=
<nowiki>public interface Reducer<V, A, R> {
+
{{CollapsibleCode|AccumulatorCombinerReducer<V, A, R>|
 +
<syntaxhighlight lang="java">
 +
public interface AccumulatorCombinerReducer<V, A, R> {
 
A createMutableContainer();
 
A createMutableContainer();
 +
 
void accumulate(A container, V item);
 
void accumulate(A container, V item);
A combine(A containerA, A containerB);
 
R reduce(A container);
 
}</nowiki>
 
  
== Java Streams Collector ==
+
void combine(A containerA, A containerB);
<nowiki>public interface Collector<T, A, R> {
 
  
// invoke supplier().get() to create a new mutable container
+
R reduce(A container);
Supplier<A> supplier();
+
}
 +
</syntaxhighlight>}}
 +
==createMutableContainer()==
 +
We use createMutableContainer() to create a new mutable container.  For classic map reduce this would be a [https://docs.oracle.com/javase/8/docs/api/java/util/List.html List<V>].
  
// invoke accumulator().accept(container, item) to add item to a container
+
==accumulate(container,item)==
BiConsumer<A, T> accumulator();
+
We use accumulate(container,item) to accumulate a value.  For classic map reduce this would add an item to a list.
  
// invoke combiner().apply(containerA, containerB) to combine one container into the other
+
==combine(containerA,containerB)==
BinaryOperator<A> combiner();
+
We use combine(containerA,containerB) to combine two accumulated mutable containers.  You must combine containerB into containerA.
  
// invoke finisher().apply(container) to reduce a container to its final form
+
==reduce(container)==
Function<A, R> finisher();
+
We use reduce(container) to produce the distilled value of the container.
}</nowiki>
 
  
[https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html interface Collector<T,A,R>]
+
=Code To Use=
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/Supplier.html interface Supplier<T>]
+
{{ListDoc}}
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/BiConsumer.html interface BiConsumer<T,U>]
+
: {{ArrayListDoc}}
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/BinaryOperator.html interface BinaryOperator<T>]
+
: {{LinkedListDoc}}
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/Function.html interface Function<T,R>]
 
  
== Rosetta Stone ==
+
[https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/mutable/MutableInt.html class MutableInt]
  
<nowiki> public static <V, A, R> Collector<V, A, R> toCollector(Reducer<V, A, R> reducer) {
+
=Code To Implement=
return new Collector<V, A, R>() {
+
==ClassicReducer==
@Override
+
The classic MapReduce approach always uses a List of <code>V</code>s as its mutable container.  This abstract class will prove useful in this exercise and in future ones.
public Supplier<A> supplier() {
 
return () -> reducer.createMutableContainer();
 
}
 
  
@Override
+
{{CodeToImplement|ClassicReducer|createMutableContainer<br>accumulate<br>combine|mapreduce.apps.reducer.classic.exercise}}
public BiConsumer<A, V> accumulator() {
 
return (container, item) -> reducer.accumulate(container, item);
 
}
 
  
@Override
+
===createMutableContainer===
public BinaryOperator<A> combiner() {
+
{{Sequential|List<V> createMutableContainer()}}
return (a, b) -> reducer.combine(a, b);
 
}
 
  
@Override
+
Construct a new instance of a List.  [https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html ArrayList] and [https://docs.oracle.com/javase/8/docs/api/java/util/LinkedList.html LinkedList] are two classes each of which implement the interface [https://docs.oracle.com/javase/8/docs/api/java/util/List.html List].
public Function<A, R> finisher() {
 
return (container) -> reducer.reduce(container);
 
}
 
  
@Override
+
===accumulate===
public Set<Characteristics> characteristics() {
+
{{Sequential|void accumulate(List<V> container, V item)}}
return reducer.collectorCharacteristics();
 
}
 
};
 
}
 
 
 
public static <V, A, R> Reducer<V, A, R> toReducer(Collector<V, A, R> collector) {
 
return new Reducer<V, A, R>() {
 
@Override
 
public A createMutableContainer() {
 
return collector.supplier().get();
 
}
 
 
 
@Override
 
public void accumulate(A container, V item) {
 
collector.accumulator().accept(container, item);
 
}
 
 
 
@Override
 
public A combine(A containerA, A containerB) {
 
return collector.combiner().apply(containerA, containerB);
 
}
 
 
 
@Override
 
public R reduce(A container) {
 
return collector.finisher().apply(container);
 
}
 
 
 
@Override
 
public Set<Characteristics> collectorCharacteristics() {
 
return collector.characteristics();
 
}
 
};
 
}
 
</nowiki>
 
 
 
==methods==
 
=== createMutableContainer a.k.a. supplier get===
 
We use createMutableContainer() to create a new mutable container.  For classic map reduce this would be a [https://docs.oracle.com/javase/8/docs/api/java/util/List.html List<V>].
 
 
 
rosetta stone: <code>container = collector.supplier().get()</code> <math>\leftrightarrow</math> <code>container = reducer.createMutableContainer()</code>
 
 
 
=== accumulate a.k.a. accumulator accept===
 
We use accumulate(container,item) to accumulate a value.  For classic map reduce this would add an item to a list.
 
 
 
rosetta stone: <code>collector.accumulator().accept(container,item);</code> <math>\leftrightarrow</math> <code>reducer.accumulate(container,item)</code>
 
  
=== combine a.k.a. combiner apply===
+
Mutate the specified <code>container</code> by [https://docs.oracle.com/javase/8/docs/api/java/util/List.html#add-E- adding] the specified <code>item</code>.
We use combine(containerA,containerB) to combine two accumulators. You may combine containerB into containerA or containerA into containerB. Just return whichever is the combined result.  
 
  
rosetta stone: <code>collector.combiner().apply(containerA,containerB)</code> <math>\leftrightarrow</math> <code>reducer.combine(containerA,containerB)</code>
+
===combine===
 +
{{Sequential|void combine(List<V> containerA, List<V> containerB)}}
  
=== reduce a.k.a. finisher apply===
+
Mutate the specified <code>containerA</code> by [https://docs.oracle.com/javase/8/docs/api/java/util/List.html#addAll-java.util.Collection- adding all] of the contents of the specified <code>containerB</code>.  Do NOT mutated containerB.
We use reduce(container) to reduce an accumulator.
 
  
rosetta stone: <code>collector.finisher().apply(container)</code> <math>\leftrightarrow</math> <code>r = reducer.reduce(container)</code>
+
==Classic Summing Int==
 +
{{CodeToImplement|SummingIntClassicReducer|reduce|mapreduce.apps.reducer.summingint.classic.exercise}}
  
=Code To Use=
+
{{Sequential|public Integer reduce(List<Integer> container)}}
[https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/mutable/MutableInt.html class MutableInt]
 
 
 
=Code To Implement=
 
==ListAccumulatingReducer==
 
The classic MapReduce Collector will collect all of the emitted values in a [https://docs.oracle.com/javase/8/docs/api/java/util/List.html List].
 
 
 
{{CodeToImplement|ListAccumulatingReducer|createMutableContainer<br>accumulate<br>combine|mapreduce.apps.reducer.listaccumulating.exercise}}
 
 
 
{{Sequential|List<V> createMutableContainer()}}
 
 
 
{{Sequential|void accumulate(List<V> container, V item)}}
 
  
{{Sequential|List<V> combine(List<V> containerA, List<V> containerB)}}
+
The reduce method is passed a list of integers which it should simply sum up and return.
  
==IntSumEfficientReducer==
+
==Efficient Summing Int==
MapReduce Apps like Word Count offer glaring opportunities to optimize the classic MapReduce append all of the 1s in a List and add them up later.  In this section of the studio you will use [https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/mutable/MutableInt.html MutableInt] to simply add the values as they come in.
+
MapReduce Apps like Word Count offer glaring opportunities to optimize the classic MapReduce append all of the 1s in a List and add them up later.  SummingIntEfficientAccumulatorCombinerReducer will use [https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/mutable/MutableInt.html MutableInt] to simply add the values as they come in.
  
{{CodeToImplement|IntSumEfficientReducer|createMutableContainer<br>accumulate<br>accumulate<br>reduce|mapreduce.apps.reducer.efficient.intsum.exercise}}
+
{{CodeToImplement|SummingIntEfficientAccumulatorCombinerReducer|createMutableContainer<br>accumulate<br>combine<br>reduce|mapreduce.apps.reducer.summingint.efficient.exercise}}
  
 
{{Sequential|MutableInt createMutableContainer()}}
 
{{Sequential|MutableInt createMutableContainer()}}
Line 146: Line 81:
 
{{Sequential|void accumulate(MutableInt container, Integer item)}}
 
{{Sequential|void accumulate(MutableInt container, Integer item)}}
  
{{Sequential|MutableInt combine(MutableInt containerA, MutableInt containerB)}}
+
{{Sequential|void combine(MutableInt containerA, MutableInt containerB)}}
  
 
{{Sequential|reduce(MutableInt container)}}
 
{{Sequential|reduce(MutableInt container)}}
  
 
=Testing Your Solution=
 
=Testing Your Solution=
==Correctness==
+
'''Note:''' in order to effectively assess this assignment, the testing leverages the WordCountMapper.  Be sure to correctly complete that assignment first.
===top level===
 
{{TestSuite|_ReducerExerciseTestSuite|mapreduce}}
 
===sub===
 
{{TestSuite|_ListAccumulatingReducerTestSuite|mapreduce}}
 
  
{{TestSuite|_EfficientIntSumTestSuite|mapreduce}}
+
{{TestSuite|_AccumulatorCombinerReducerTestSuite|mapreduce.apps}}

Latest revision as of 21:16, 1 April 2024

credit for this assignment: Finn Voichick and Dennis Cosgrove

Motivation

interface AccumulatorCombinerReducer<V,A,R> is fundamental to the MapReduce assignments, providing four of the five methods which allow clients to well use interface MapReduceFramework<E,K,V,A,R>.

First, we will implement abstract class ClassicReducer<V,R> which will provide the basic MapReduce functionality used by the PageRank algorithm which made Google's original search engine so effective.

Then, we will implement two AccumulatorCombinerReducers which can serve to sum mutable containers of Integers. Thus, they will pair well with the previous Mappers exercise.

Background

The Java streams framework uses interface Collector<T,A,R> to allow clients to customize the four methods to create, accumulate, combine, and reduce. interface Collector<T,A,R> chose to add an extra level of indirection to allow a hyper-lambdafication style. While there is nothing fundamentally wrong with this approach, it was eventually deemed more trouble than it was worth for CSE 231s. To learn more about Collector and its connection to AccumulatorCombinerReducer, check out this Rosetta Stone.

interface AccumulatorCombinerReducer<V,A,R>

AccumulatorCombinerReducer<V, A, R>  
public interface AccumulatorCombinerReducer<V, A, R> {
	A createMutableContainer();

	void accumulate(A container, V item);

	void combine(A containerA, A containerB);

	R reduce(A container);
}

createMutableContainer()

We use createMutableContainer() to create a new mutable container. For classic map reduce this would be a List<V>.

accumulate(container,item)

We use accumulate(container,item) to accumulate a value. For classic map reduce this would add an item to a list.

combine(containerA,containerB)

We use combine(containerA,containerB) to combine two accumulated mutable containers. You must combine containerB into containerA.

reduce(container)

We use reduce(container) to produce the distilled value of the container.

Code To Use

List<E>

class ArrayList<E>
class LinkedList<E>

class MutableInt

Code To Implement

ClassicReducer

The classic MapReduce approach always uses a List of Vs as its mutable container. This abstract class will prove useful in this exercise and in future ones.

class: ClassicReducer.java Java.png
methods: createMutableContainer
accumulate
combine
package: mapreduce.apps.reducer.classic.exercise
source folder: student/src/main/java

createMutableContainer

method: List<V> createMutableContainer() Sequential.svg (sequential implementation only)

Construct a new instance of a List. ArrayList and LinkedList are two classes each of which implement the interface List.

accumulate

method: void accumulate(List<V> container, V item) Sequential.svg (sequential implementation only)

Mutate the specified container by adding the specified item.

combine

method: void combine(List<V> containerA, List<V> containerB) Sequential.svg (sequential implementation only)

Mutate the specified containerA by adding all of the contents of the specified containerB. Do NOT mutated containerB.

Classic Summing Int

class: SummingIntClassicReducer.java Java.png
methods: reduce
package: mapreduce.apps.reducer.summingint.classic.exercise
source folder: student/src/main/java

method: public Integer reduce(List<Integer> container) Sequential.svg (sequential implementation only)

The reduce method is passed a list of integers which it should simply sum up and return.

Efficient Summing Int

MapReduce Apps like Word Count offer glaring opportunities to optimize the classic MapReduce append all of the 1s in a List and add them up later. SummingIntEfficientAccumulatorCombinerReducer will use MutableInt to simply add the values as they come in.

class: SummingIntEfficientAccumulatorCombinerReducer.java Java.png
methods: createMutableContainer
accumulate
combine
reduce
package: mapreduce.apps.reducer.summingint.efficient.exercise
source folder: student/src/main/java

method: MutableInt createMutableContainer() Sequential.svg (sequential implementation only)

method: void accumulate(MutableInt container, Integer item) Sequential.svg (sequential implementation only)

method: void combine(MutableInt containerA, MutableInt containerB) Sequential.svg (sequential implementation only)

method: reduce(MutableInt container) Sequential.svg (sequential implementation only)

Testing Your Solution

Note: in order to effectively assess this assignment, the testing leverages the WordCountMapper. Be sure to correctly complete that assignment first.

class: _AccumulatorCombinerReducerTestSuite.java Junit.png
package: mapreduce.apps
source folder: testing/src/test/java