Difference between revisions of "MapReduce Frameworks Lab"
Line 5: | Line 5: | ||
Let’s start with the Framework. At the end of the day, the first job of the Framework is mapAll() which unsurprisingly calls map() on the Mapper for every piece of data. That’s it. Don’t overcomplicate it. The framework leaves all of the details of what (key, value) pairs to emit to the Mapper. Just call map() for every piece of data. Granted, it does need to create an instance of Context to receive those emissions. For the SimpleFramework we end up creating a context that holds a single dictionary to handle all of the key, value pairs the Mapper throws at it. This IndividualContext leaves grouping to the next stage so it can simply associate the keys with the values and be done. For the MatrixFramework, we end up mapping and grouping at once so its GroupStageSkippingContext is a bit more complicated, but the mapAndGroupAll() method on MatrixFramework stays nice and simple. It call map() on the Mapper for each piece of data. As stated before, don’t overcomplicate it. You will want to create an n-way-split-esque separation of the data into tasks, each with its own GroupStageSkippingContext, but don’t lose sight of the goal: call map on the Mapper for each piece of the data. | Let’s start with the Framework. At the end of the day, the first job of the Framework is mapAll() which unsurprisingly calls map() on the Mapper for every piece of data. That’s it. Don’t overcomplicate it. The framework leaves all of the details of what (key, value) pairs to emit to the Mapper. Just call map() for every piece of data. Granted, it does need to create an instance of Context to receive those emissions. For the SimpleFramework we end up creating a context that holds a single dictionary to handle all of the key, value pairs the Mapper throws at it. This IndividualContext leaves grouping to the next stage so it can simply associate the keys with the values and be done. For the MatrixFramework, we end up mapping and grouping at once so its GroupStageSkippingContext is a bit more complicated, but the mapAndGroupAll() method on MatrixFramework stays nice and simple. It call map() on the Mapper for each piece of data. As stated before, don’t overcomplicate it. You will want to create an n-way-split-esque separation of the data into tasks, each with its own GroupStageSkippingContext, but don’t lose sight of the goal: call map on the Mapper for each piece of the data. | ||
− | Next up, let’s talk about the Mapper’s role. | + | Next up, let’s talk about the Mapper’s role. The Mapper is application specific. By that I mean, if we were distributing a system to compete with Hadoop we would build one MatrixFramework as well as we could and ship it. Users of our system would then write any number of custom Mappers and Reducers to solve their particular problems. |
+ | The Mapper is passed two parameters: a context which knows how to receive (key,value) pairs and a piece of data to process. By “process” we mean simply figure out what (key, value) pairs it wants to emit and emit them. We have built a number of Mappers in this course. The WordCountMapper processes a line of text and emits pairs of (key=word, value=1) for each word in the line. The MutualFriendsMapper emits (key=friendPair, value=account’sFriendList) for each of the passed in account’s friends. The CardMapper emits pairs of (key=suit, value=numericValue) for each card it is passed. The CholoraMapper is passed a residence location for someone who died in the outbreak and emits a single (key, value) pair: (key=the water pump closest to that location, value=1). | ||
=Part 1 Simple Framework= | =Part 1 Simple Framework= |
Revision as of 06:15, 10 March 2017
The MapReduce assignment has many different pieces that come together to solve problems. So much so that in the end writing each individual class or method can be less challenging than simply figuring out what each piece’s role should be in the larger puzzle.
At a high level we have a framework, a mapper, a reducer, and some data we would like to process.
Let’s start with the Framework. At the end of the day, the first job of the Framework is mapAll() which unsurprisingly calls map() on the Mapper for every piece of data. That’s it. Don’t overcomplicate it. The framework leaves all of the details of what (key, value) pairs to emit to the Mapper. Just call map() for every piece of data. Granted, it does need to create an instance of Context to receive those emissions. For the SimpleFramework we end up creating a context that holds a single dictionary to handle all of the key, value pairs the Mapper throws at it. This IndividualContext leaves grouping to the next stage so it can simply associate the keys with the values and be done. For the MatrixFramework, we end up mapping and grouping at once so its GroupStageSkippingContext is a bit more complicated, but the mapAndGroupAll() method on MatrixFramework stays nice and simple. It call map() on the Mapper for each piece of data. As stated before, don’t overcomplicate it. You will want to create an n-way-split-esque separation of the data into tasks, each with its own GroupStageSkippingContext, but don’t lose sight of the goal: call map on the Mapper for each piece of the data.
Next up, let’s talk about the Mapper’s role. The Mapper is application specific. By that I mean, if we were distributing a system to compete with Hadoop we would build one MatrixFramework as well as we could and ship it. Users of our system would then write any number of custom Mappers and Reducers to solve their particular problems.
The Mapper is passed two parameters: a context which knows how to receive (key,value) pairs and a piece of data to process. By “process” we mean simply figure out what (key, value) pairs it wants to emit and emit them. We have built a number of Mappers in this course. The WordCountMapper processes a line of text and emits pairs of (key=word, value=1) for each word in the line. The MutualFriendsMapper emits (key=friendPair, value=account’sFriendList) for each of the passed in account’s friends. The CardMapper emits pairs of (key=suit, value=numericValue) for each card it is passed. The CholoraMapper is passed a residence location for someone who died in the outbreak and emits a single (key, value) pair: (key=the water pump closest to that location, value=1).
Contents
- 1 Part 1 Simple Framework
- 1.1 class StudentIndividualMapContext
- 1.2 Word Count Application
- 1.3 Mutual Friends Application
- 1.4 class WordCountConcreteStaticMapReduce (optional but encouraged)
- 1.5 class MutualFriendsConcreteStaticMapReduce (optional but encouraged)
- 1.6 SimpleMapReduceFramework<E,K,V>
- 1.7 FinishAccumulatorBasedReducer
- 2 Part 2 Matrix Framework
Part 1 Simple Framework
class StudentIndividualMapContext
public void emit(K key, V value)
Word Count Application
class WordCountMapper
public void map(MapContext<String, Integer> context, String[] lineOfWords)
class WordCountReducer
public FinishAccumulator<Integer> createAccumulator()
Test
class TestWordCountApplicationWithInstructorFramework
Mutual Friends Application
class MutualFriendsMapper
public void map(MapContext<OrderedPair<AccountId>, List<AccountId>> context, Account account)
class MutualFriendsReducer
public FinishAccumulator<List<AccountId>> createAccumulator()
see: creating a custom reduction accumulator and ListIntersectionReducer<T>
Test
class TestMutualFriendsApplicationWithInstructorFramework
class WordCountConcreteStaticMapReduce (optional but encouraged)
mapAll
private static IndividualMapContext<String, Integer>[] mapAll(String[][] S, WordCountMapper mapper) throws SuspendableException
groupAll
private static Map<String, List<Integer>> groupAll(IndividualMapContext<String, Integer>[] f_of_S)
reduceAll
private static Map<String, Integer> reduceAll(Map<String, List<Integer>> grouped_f_of_S, WordCountReducer reducer) throws SuspendableException
Test
class TestWordCountConcreteStaticFramework
class MutualFriendsConcreteStaticMapReduce (optional but encouraged)
mapAll
private static IndividualMapContext<OrderedPair<AccountId>, List<AccountId>>[] mapAll(Account[] S, MutualFriendsMapper mapper) throws SuspendableException
groupAll
private static Map<OrderedPair<AccountId>, List<List<AccountId>>> groupAll(IndividualMapContext<OrderedPair<AccountId>, List<AccountId>>[] f_of_S)
reduceAll
private static Map<OrderedPair<AccountId>, List<AccountId>> reduceAll(Map<OrderedPair<AccountId>, List<List<AccountId>>> grouped_f_of_S, MutualFriendsReducer reducer) throws SuspendableException
Test
class TestMutualFriendsConcreteStaticFramework
SimpleMapReduceFramework<E,K,V>
mapAll
private IndividualMapContext<K,V>[] mapAll(E[] S, Mapper<E,K,V> mapper) throws SuspendableException
groupAll
private Map<K, List<V>> groupAll(IndividualMapContext<K,V>[] f_of_S)
reduceAll
private Map<K, V> reduceAll(Map<K, List<V>> grouped_f_of_S, Reducer<V> reducer) throws SuspendableException
Test
class TestMutualFriendsSolution
FinishAccumulatorBasedReducer
reduce
default V reduce( List<V> list ) throws SuspendableException
Part 2 Matrix Framework
class StudentGroupStageSkippingMapContext<K,V>
emit
public void emit(K key, V value)
class MatrixMapReduceFramework<E,K,V>
mapGroupAll
private GroupStageSkippingMapContext<K, V>[] mapGroupAll(E[] S, Mapper<E, K, V> mapper) throws SuspendableException
reduceAll
private Map<K,V> reduceAll(GroupStageSkippingMapContext<K, V>[] grouped_f_of_S, Reducer<V> reducer) throws SuspendableException