Difference between revisions of "Collector Rosetta Stone"

From CSE231 Wiki
Jump to navigation Jump to search
 
(8 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
You will use {{AccumulatorCombinerReducerLink}} for our MapReduce assignments which is almost a one-to-one match with {{CollectorLink}} but de-ultra-uber-hyper-mega-super-lambdafied.   
 
You will use {{AccumulatorCombinerReducerLink}} for our MapReduce assignments which is almost a one-to-one match with {{CollectorLink}} but de-ultra-uber-hyper-mega-super-lambdafied.   
  
== CSE 231s: AccumulatorCombinerReducer<V, A, R> ==
+
= One To One =
<syntaxhighlight lang="java">
+
{| class="wikitable" style="margin-left:0px"
 +
|-
 +
! CSE 231s: AccumulatorCombinerReducer<V, A, R> !! !! Java Streams: Collector<T, A, R>
 +
|-
 +
| <syntaxhighlight lang="java">
 
public interface AccumulatorCombinerReducer<V, A, R> {
 
public interface AccumulatorCombinerReducer<V, A, R> {
 
A createMutableContainer();
 
A createMutableContainer();
Line 13: Line 17:
  
 
R reduce(A container);
 
R reduce(A container);
 +
}
 +
</syntaxhighlight> || <syntaxhighlight lang="java">
  
default Set<Characteristics> collectorCharacteristics() {
+
<===>
return EnumSet.noneOf(Characteristics.class);
+
 
}
+
<===>
}
+
 
</syntaxhighlight>
+
<===>
 +
 
 +
<===>
  
== Java Streams: Collector<T, A, R> ==
+
</syntaxhighlight> || <syntaxhighlight lang="java">
<syntaxhighlight lang="java">
 
 
public interface Collector<T, A, R> {
 
public interface Collector<T, A, R> {
    /**
 
    * A function that creates and returns a new mutable result container.
 
    *
 
    * @return a function which returns a new, mutable result container
 
    */
 
 
     Supplier<A> supplier();
 
     Supplier<A> supplier();
  
    /**
 
    * A function that folds a value into a mutable result container.
 
    *
 
    * @return a function which folds a value into a mutable result container
 
    */
 
 
     BiConsumer<A, T> accumulator();
 
     BiConsumer<A, T> accumulator();
  
    /**
 
    * A function that accepts two partial results and merges them.  The
 
    * combiner function may fold state from one argument into the other and
 
    * return that, or may return a new result container.
 
    *
 
    * @return a function which combines two partial results into a combined
 
    * result
 
    */
 
 
     BinaryOperator<A> combiner();
 
     BinaryOperator<A> combiner();
  
    /**
 
    * Perform the final transformation from the intermediate accumulation type
 
    * {@code A} to the final result type {@code R}.
 
    *
 
    * <p>If the characteristic {@code IDENTITY_FINISH} is
 
    * set, this function may be presumed to be an identity transform with an
 
    * unchecked cast from {@code A} to {@code R}.
 
    *
 
    * @return a function which transforms the intermediate result to the final
 
    * result
 
    */
 
 
     Function<A, R> finisher();
 
     Function<A, R> finisher();
 
    /**
 
    * Returns a {@code Set} of {@code Collector.Characteristics} indicating
 
    * the characteristics of this Collector.  This set should be immutable.
 
    *
 
    * @return an immutable set of collector characteristics
 
    */
 
    Set<Characteristics> characteristics();
 
 
}
 
}
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
|}
  
== Java Streams Collector ==
+
= Differences =
<nowiki>public interface Collector<T, A, R> {
+
Each interface provides a way to customize 4 of 5 of the methods required for Spark-level MapReduce functionality.  {{AccumulatorCombinerReducerLink}} elects to perform the operations directly.  {{CollectorLink}} elects to return single-abstract method interfaces which perform the operations.
 +
 
 +
[https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html interface Collector<T,A,R>]
 +
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/Supplier.html interface Supplier<T>] returned by supplier()
 +
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/BiConsumer.html interface BiConsumer<T,U>] returned by accumulator()
 +
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/BinaryOperator.html interface BinaryOperator<T>] returned by combiner()
 +
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/Function.html interface Function<T,R>] returned by finisher()
 +
 
 +
Finally, the combine operation was simplified via a reasonable constraint to AccumulatorCombinerReducer.  This allows for a slightly cleaner, more elegant solution to the [[#Matrix_Map_Reduce_Framework_Assignment|Matrix MapReduce Framework exercise]].
 +
 
 +
=Methods=
 +
== createMutableContainer a.k.a. supplier get==
 +
We use createMutableContainer() to create a new mutable container.  For classic map reduce this would be a [https://docs.oracle.com/javase/8/docs/api/java/util/List.html List<V>].
 +
 
 +
rosetta stone: <code>container = collector.supplier().get()</code> <math>\leftrightarrow</math> <code>container = acr.createMutableContainer()</code>
 +
 
 +
== accumulate a.k.a. accumulator accept==
 +
We use accumulate(container,item) to accumulate a value.  For classic map reduce this would add an item to a list.
  
// invoke supplier().get() to create a new mutable container
+
rosetta stone: <code>collector.accumulator().accept(container,item);</code> <math>\leftrightarrow</math> <code>acr.accumulate(container,item)</code>
Supplier<A> supplier();
 
  
// invoke accumulator().accept(container, item) to add item to a container
+
== combine a.k.a. combiner apply==
BiConsumer<A, T> accumulator();
+
We use combine(containerA,containerB) to combine two accumulators.  You may combine containerB into containerA or containerA into containerB.  Just return whichever is the combined result.
  
// invoke combiner().apply(containerA, containerB) to combine one container into the other
+
rosetta stone: <code>collector.combiner().apply(containerA,containerB)</code> <math>\leftrightarrow</math> <code>acr.combine(containerA,containerB)</code>
BinaryOperator<A> combiner();
 
  
// invoke finisher().apply(container) to reduce a container to its final form
+
'''Note:''' {{CollectorLink}} allows the client to combine containerB into containerA, containerA into containerB, or create a new container with the combined contents of containerA and containerB. The only requirement is that resulting combined container is the return value.  {{AccumulatorCombinerReducerLink}} mandates that containerB be combined into containerA to take a bit of cruft out of the [[#Matrix_Map_Reduce_Framework_Assignment|Matrix MapReduce Framework exercise]].
Function<A, R> finisher();
 
}</nowiki>
 
  
[https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html interface Collector<T,A,R>]
+
== reduce a.k.a. finisher apply==
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/Supplier.html interface Supplier<T>]
+
We use reduce(container) to reduce an accumulator.
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/BiConsumer.html interface BiConsumer<T,U>]
 
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/BinaryOperator.html interface BinaryOperator<T>]
 
: [https://docs.oracle.com/javase/8/docs/api/java/util/function/Function.html interface Function<T,R>]
 
  
== Rosetta Stone ==
+
rosetta stone: <code>collector.finisher().apply(container)</code> <math>\leftrightarrow</math> <code>r = acr.reduce(container)</code>
  
<nowiki> public static <V, A, R> Collector<V, A, R> toCollector(Reducer<V, A, R> reducer) {
+
= Converting Back And Forth =
 +
<syntaxhighlight lang="java">
 +
public class StreamUtils {
 +
public static <V, A, R> Collector<V, A, R> toCollector(
 +
AccumulatorCombinerReducer<V, A, R> accumulatorCombinerReducer) {
 
return new Collector<V, A, R>() {
 
return new Collector<V, A, R>() {
 
@Override
 
@Override
 
public Supplier<A> supplier() {
 
public Supplier<A> supplier() {
return () -> reducer.createMutableContainer();
+
return () -> accumulatorCombinerReducer.createMutableContainer();
 
}
 
}
  
 
@Override
 
@Override
 
public BiConsumer<A, V> accumulator() {
 
public BiConsumer<A, V> accumulator() {
return (container, item) -> reducer.accumulate(container, item);
+
return (container, item) -> accumulatorCombinerReducer.accumulate(container, item);
 
}
 
}
  
 
@Override
 
@Override
 
public BinaryOperator<A> combiner() {
 
public BinaryOperator<A> combiner() {
return (a, b) -> reducer.combine(a, b);
+
return (a, b) -> {
 +
accumulatorCombinerReducer.combine(a, b);
 +
return a;
 +
};
 
}
 
}
  
 
@Override
 
@Override
 
public Function<A, R> finisher() {
 
public Function<A, R> finisher() {
return (container) -> reducer.reduce(container);
+
return (container) -> accumulatorCombinerReducer.reduce(container);
 
}
 
}
  
 
@Override
 
@Override
 
public Set<Characteristics> characteristics() {
 
public Set<Characteristics> characteristics() {
return reducer.collectorCharacteristics();
+
return accumulatorCombinerReducer.collectorCharacteristics();
 
}
 
}
 
};
 
};
 
}
 
}
  
public static <V, A, R> Reducer<V, A, R> toReducer(Collector<V, A, R> collector) {
+
public static <V, A, R> AccumulatorCombinerReducer<V, A, R> toAccumulatorCombinerReducer(
return new Reducer<V, A, R>() {
+
Collector<V, A, R> collector) {
 +
return new AccumulatorCombinerReducer<V, A, R>() {
 
@Override
 
@Override
 
public A createMutableContainer() {
 
public A createMutableContainer() {
Line 136: Line 125:
  
 
@Override
 
@Override
public A combine(A containerA, A containerB) {
+
public void combine(A containerA, A containerB) {
return collector.combiner().apply(containerA, containerB);
+
A result = collector.combiner().apply(containerA, containerB);
 +
if (result != containerA) {
 +
throw new RuntimeException("collector must combine b into a and return a.");
 +
}
 
}
 
}
  
Line 151: Line 143:
 
};
 
};
 
}
 
}
</nowiki>
+
}
 
+
</syntaxhighlight>
==methods==
 
=== createMutableContainer a.k.a. supplier get===
 
We use createMutableContainer() to create a new mutable container.  For classic map reduce this would be a [https://docs.oracle.com/javase/8/docs/api/java/util/List.html List<V>].
 
 
 
rosetta stone: <code>container = collector.supplier().get()</code> <math>\leftrightarrow</math> <code>container = reducer.createMutableContainer()</code>
 
 
 
=== accumulate a.k.a. accumulator accept===
 
We use accumulate(container,item) to accumulate a value.  For classic map reduce this would add an item to a list.
 
 
 
rosetta stone: <code>collector.accumulator().accept(container,item);</code> <math>\leftrightarrow</math> <code>reducer.accumulate(container,item)</code>
 
 
 
=== combine a.k.a. combiner apply===
 
We use combine(containerA,containerB) to combine two accumulators.  You may combine containerB into containerA or containerA into containerB.  Just return whichever is the combined result.
 
 
 
rosetta stone: <code>collector.combiner().apply(containerA,containerB)</code> <math>\leftrightarrow</math> <code>reducer.combine(containerA,containerB)</code>
 
 
 
=== reduce a.k.a. finisher apply===
 
We use reduce(container) to reduce an accumulator.
 
 
 
rosetta stone: <code>collector.finisher().apply(container)</code> <math>\leftrightarrow</math> <code>r = reducer.reduce(container)</code>
 

Latest revision as of 16:33, 23 February 2023

The interface Collector<T,A,R> serves the standard Java streams framework for MapReduce-like tasks with added in-memory processing capability a la Apache Spark.

You will use interface AccumulatorCombinerReducer<V,A,R> for our MapReduce assignments which is almost a one-to-one match with interface Collector<T,A,R> but de-ultra-uber-hyper-mega-super-lambdafied.

One To One

CSE 231s: AccumulatorCombinerReducer<V, A, R> Java Streams: Collector<T, A, R>
public interface AccumulatorCombinerReducer<V, A, R> {
	A createMutableContainer();

	void accumulate(A container, V item);

	void combine(A containerA, A containerB);

	R reduce(A container);
}
<===>

<===>

<===>

<===>
public interface Collector<T, A, R> {
    Supplier<A> supplier();

    BiConsumer<A, T> accumulator();

    BinaryOperator<A> combiner();

    Function<A, R> finisher();
}

Differences

Each interface provides a way to customize 4 of 5 of the methods required for Spark-level MapReduce functionality. interface AccumulatorCombinerReducer<V,A,R> elects to perform the operations directly. interface Collector<T,A,R> elects to return single-abstract method interfaces which perform the operations.

interface Collector<T,A,R>

interface Supplier<T> returned by supplier()
interface BiConsumer<T,U> returned by accumulator()
interface BinaryOperator<T> returned by combiner()
interface Function<T,R> returned by finisher()

Finally, the combine operation was simplified via a reasonable constraint to AccumulatorCombinerReducer. This allows for a slightly cleaner, more elegant solution to the Matrix MapReduce Framework exercise.

Methods

createMutableContainer a.k.a. supplier get

We use createMutableContainer() to create a new mutable container. For classic map reduce this would be a List<V>.

rosetta stone: container = collector.supplier().get() container = acr.createMutableContainer()

accumulate a.k.a. accumulator accept

We use accumulate(container,item) to accumulate a value. For classic map reduce this would add an item to a list.

rosetta stone: collector.accumulator().accept(container,item); acr.accumulate(container,item)

combine a.k.a. combiner apply

We use combine(containerA,containerB) to combine two accumulators. You may combine containerB into containerA or containerA into containerB. Just return whichever is the combined result.

rosetta stone: collector.combiner().apply(containerA,containerB) acr.combine(containerA,containerB)

Note: interface Collector<T,A,R> allows the client to combine containerB into containerA, containerA into containerB, or create a new container with the combined contents of containerA and containerB. The only requirement is that resulting combined container is the return value. interface AccumulatorCombinerReducer<V,A,R> mandates that containerB be combined into containerA to take a bit of cruft out of the Matrix MapReduce Framework exercise.

reduce a.k.a. finisher apply

We use reduce(container) to reduce an accumulator.

rosetta stone: collector.finisher().apply(container) r = acr.reduce(container)

Converting Back And Forth

public class StreamUtils {
	public static <V, A, R> Collector<V, A, R> toCollector(
			AccumulatorCombinerReducer<V, A, R> accumulatorCombinerReducer) {
		return new Collector<V, A, R>() {
			@Override
			public Supplier<A> supplier() {
				return () -> accumulatorCombinerReducer.createMutableContainer();
			}

			@Override
			public BiConsumer<A, V> accumulator() {
				return (container, item) -> accumulatorCombinerReducer.accumulate(container, item);
			}

			@Override
			public BinaryOperator<A> combiner() {
				return (a, b) -> {
					accumulatorCombinerReducer.combine(a, b);
					return a;
				};
			}

			@Override
			public Function<A, R> finisher() {
				return (container) -> accumulatorCombinerReducer.reduce(container);
			}

			@Override
			public Set<Characteristics> characteristics() {
				return accumulatorCombinerReducer.collectorCharacteristics();
			}
		};
	}

	public static <V, A, R> AccumulatorCombinerReducer<V, A, R> toAccumulatorCombinerReducer(
			Collector<V, A, R> collector) {
		return new AccumulatorCombinerReducer<V, A, R>() {
			@Override
			public A createMutableContainer() {
				return collector.supplier().get();
			}

			@Override
			public void accumulate(A container, V item) {
				collector.accumulator().accept(container, item);
			}

			@Override
			public void combine(A containerA, A containerB) {
				A result = collector.combiner().apply(containerA, containerB);
				if (result != containerA) {
					throw new RuntimeException("collector must combine b into a and return a.");
				}
			}

			@Override
			public R reduce(A container) {
				return collector.finisher().apply(container);
			}

			@Override
			public Set<Characteristics> collectorCharacteristics() {
				return collector.characteristics();
			}
		};
	}
}