MapReduce Mapper Assignment
Motivation
In previous semesters the MapReduce exercise has proven to be the most challenging. We will start by building some Mappers on our way to the final boss.
Each of the Mappers built today can be pairs with an Int Summing AccumulatorCombinerReducer:
- a card mapper that matches the spec outlined in the prep video,
- a simple word counting mapper, and
- an analogous k-mer counting mapper.
Note: the k-mer counting mapper will prepare us for (and hopefully lessen the burden of) an exercise later in the semester.
Code To Use
Previous Exercise
Provided
CardMapper Utilities
Deck implements Iterable<Card>
- rank.numericValue() note: returns Optional<Integer>
WordCount Mapper Utilities
K-mer Mapper Utilities
toStringKMer(sequence, offset, kMerLength) |
---|
private static String toStringKMer(byte[] sequence, int offset, int kMerLength) {
return new String(sequence, offset, kMerLength, StandardCharsets.UTF_8);
}
|
Code To Implement
Card Mapper
The specification for this mapper is outlined in the prep video:
Video: Learning MapReduce with Playing Cards |
---|
Clients
CardMapperClient
class: | CardMapperClient.java | CLIENT |
package: | mapreduce.apps.cards.client | |
source folder: | student/src/main/java |
CardMapperClient |
---|
Deck deck = Deck.createFull();
CardMapper mapper = new CardMapper();
List<Map.Entry<Suit, Integer>> keyValuePairs = mapper.map(deck);
keyValuePairs.forEach(kv -> {
System.out.println(kv);
});
|
CardMapperClient Output |
---|
DefaultEntry[SPADES=>10] DefaultEntry[SPADES=>9] DefaultEntry[SPADES=>8] DefaultEntry[SPADES=>7] DefaultEntry[SPADES=>6] DefaultEntry[SPADES=>5] DefaultEntry[SPADES=>4] DefaultEntry[SPADES=>3] DefaultEntry[SPADES=>2] DefaultEntry[HEARTS=>10] DefaultEntry[HEARTS=>9] DefaultEntry[HEARTS=>8] DefaultEntry[HEARTS=>7] DefaultEntry[HEARTS=>6] DefaultEntry[HEARTS=>5] DefaultEntry[HEARTS=>4] DefaultEntry[HEARTS=>3] DefaultEntry[HEARTS=>2] DefaultEntry[DIAMONDS=>10] DefaultEntry[DIAMONDS=>9] DefaultEntry[DIAMONDS=>8] DefaultEntry[DIAMONDS=>7] DefaultEntry[DIAMONDS=>6] DefaultEntry[DIAMONDS=>5] DefaultEntry[DIAMONDS=>4] DefaultEntry[DIAMONDS=>3] DefaultEntry[DIAMONDS=>2] DefaultEntry[CLUBS=>10] DefaultEntry[CLUBS=>9] DefaultEntry[CLUBS=>8] DefaultEntry[CLUBS=>7] DefaultEntry[CLUBS=>6] DefaultEntry[CLUBS=>5] DefaultEntry[CLUBS=>4] DefaultEntry[CLUBS=>3] DefaultEntry[CLUBS=>2] |
CardMapReduceClient
class: | CardMapperClient.java | CLIENT |
package: | mapreduce.apps.cards.client | |
source folder: | student/src/main/java |
CardMapReduceClient |
---|
Deck[] decks = new Deck[] {
Deck.createFull(),
Deck.createFull(),
Deck.createFull(),
Deck.createFull(),
};
CardMapper mapper = new CardMapper();
AccumulatorCombinerReducer<Integer, ?, Integer> accumulatorCombinerReducer = StreamUtils.summingIntAccumulatorCombinerReducer();
MapReduceFramework<Deck, Suit, Integer, ?, Integer> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer);
Map<Suit, Integer> map = framework.mapReduceAll(decks);
map.entrySet().forEach(entry -> {
System.out.println(entry);
});
|
CardMapReduceClient Output |
---|
HEARTS=216 SPADES=216 DIAMONDS=216 CLUBS=216 |
Implementation
Non-numeric cards are considered to be bad data and ignored. Numeric cards should be emitted with their suit as the key and the numeric value as the value.
Emitting a key and a value would look something like:
Suit key = // appropriate code here int value = // appropriate code here keyValuePairConsumer.accept(key, value);
class: | CardMapper.java | |
methods: | map | |
package: | mapreduce.app.cards.exercise | |
source folder: | student/src/main/java |
method: public void map(Deck deck, BiConsumer<Suit, Integer> keyValuePairConsumer)
(sequential implementation only)
Word Count Mapper
Counting occurrences of words in text is a classic example of mapreduce. We will ignore any zero length words and convert the remaining words to lower-case so as to get a case insensitive count. Emitting each lower-cased word as the key with the value of 1 should do the trick here.
class: | WordCountMapper.java | |
methods: | map | |
package: | mapreduce.apps.wordcount.exercise | |
source folder: | student/src/main/java |
method: public void map(TextSection textSection, BiConsumer<String, Integer> keyValuePairConsumer)
(sequential implementation only)
The goal of this implementation is to count the number of times a word appears in a given text, using MapReduce. To accomplish this, you will need to create both the mapper and the reducer. Navigate to the WordCountMapper.java
and IntSumListAccumulatingReducer.java
classes. You will specifically define how the framework accomplishes the map and reduce methods.
The only method you will need to alter is the map method. In this method, you need to record every instance of a given word and feed it into the keyValuePairConsumer. To do this, access all of the words in the TextSection and if the length of the word is greater than zero (meaning it is not just blank space), convert it into lower-case and accept it into the consumer.
Hint: Look at the methods in TextSection and the toLowerCase() method for strings for assistance.
K-mer Count Mapper
K-mer counting is a useful technique in bioinformatics: http://www.csbio.unc.edu/mcmillan/Comp555S17/Lecture02.pdf
Background information on k-mer counting can be found here: https://en.wikipedia.org/wiki/K-mer
The 3-mers in the chromosome:
ACTCATGAG
are:
ACT CTC TCA CAT ATG TGA GAG
class: | KMerMapper.java | |
methods: | map | |
package: | mapreduce.apps.kmer.studio | |
source folder: | student/src/main/java |
method: public void map(byte[] sequence, BiConsumer<String, Integer> keyValuePairConsumer)
(sequential implementation only)
Be sure to use the provided toStringKMer(sequence, offset, kMerLength) method to generate your k-mers:
private static String toStringKMer(byte[] sequence, int offset, int kMerLength) { return new String(sequence, offset, kMerLength, StandardCharsets.UTF_8); }
This mapper is similar to the #Word Count Mapper except that the k-mers overlap with each other while words are separate.
As the emitted values for each key will be later summed up in the reduction phase, what value makes sense to emit with each key?
Testing Your Solution
Complete Correctness
class: | _MappersSuitableForPairingWithIntSummingReducerTestSuite.java | |
package: | mapreduce.apps | |
source folder: | testing/src/test/java |
Individual Correctness
CardMapper
class: | _CardMapperTestSuite.java | |
package: | mapreduce.apps.cards.exercise | |
source folder: | testing/src/test/java |
KMerMapper
class: | _KMerMapperTestSuite.java | |
package: | mapreduce.apps.kmer.exercise | |
source folder: | testing/src/test/java |
WordCountMapper
class: | _WordCountMapperTestSuite.java | |
package: | mapreduce.apps.wordcount.exercise | |
source folder: | testing/src/test/java |