Difference between revisions of "MapReduce Mapper Assignment"
m (→CardMapper) |
|||
(60 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
=Motivation= | =Motivation= | ||
− | In previous semesters the MapReduce | + | In previous semesters the MapReduce exercise has proven to be the most challenging. We will start by building some Mappers on our way to the [[Matrix_MapReduce_Framework_Assignment|final boss]]. |
− | + | Each of the Mappers built today can be pairs with an Int Summing AccumulatorCombinerReducer: | |
+ | * a card mapper that matches the spec outlined in the prep video, | ||
+ | * a simple word counting mapper, and | ||
+ | * an analogous k-mer counting mapper. | ||
− | + | Note: the k-mer counting mapper will prepare us for (and hopefully lessen the burden of) an [[K_Mer_Counting_Assignment|exercise]] later in the semester. | |
− | |||
=Code To Use= | =Code To Use= | ||
− | [ | + | ==Previous Exercise== |
− | + | [[Not_Thread_Safe_Hash_Table_Assignment#DefaultEntry.3CK.2CV.3E|DefaultEntry<K,V>]] | |
− | |||
− | |||
− | [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Deck.html Deck] | + | ==Provided== |
+ | ===CardMapper Utilities=== | ||
+ | [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Deck.html Deck] '''implements Iterable<Card>''' | ||
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Card.html Card] | [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Card.html Card] | ||
− | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Card.html# | + | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Card.html#rank-- card.rank()] |
− | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Card.html# | + | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Card.html#suit-- card.suit()] |
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Rank.html Rank] | [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Rank.html Rank] | ||
− | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Rank.html# | + | : [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Rank.html#numericValue-- rank.numericValue()] '''note: returns Optional<Integer>''' |
− | : | ||
[https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Suit.html Suit] | [https://www.cse.wustl.edu/~cosgroved/courses/cse231/current/apidocs/mapreduce/apps/cards/core/Suit.html Suit] | ||
− | [https://www.cse.wustl.edu/~ | + | ===WordCount Mapper Utilities=== |
− | : [https://www.cse.wustl.edu/~ | + | [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/edu/wustl/cse231s/text/core/TextSection.html TextSection] |
+ | : [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/current/apidocs/edu/wustl/cse231s/text/core/TextSection.html#words-- textSection.words()] | ||
+ | |||
+ | ===K-mer Mapper Utilities=== | ||
+ | {{CollapsibleCode|toStringKMer(sequence, offset, kMerLength)| | ||
+ | <syntaxhighlight lang="java"> | ||
+ | private static String toStringKMer(byte[] sequence, int offset, int kMerLength) { | ||
+ | return new String(sequence, offset, kMerLength, StandardCharsets.UTF_8); | ||
+ | } | ||
+ | </syntaxhighlight>}} | ||
+ | |||
+ | =Code To Invesitigate= | ||
+ | Note: each of the clients print entries. The entries produced by the map methods of the Mappers are instances of DefaultEntry. The entries produced by the StreamFramework are instances of a different implementation of Entry. Their toString() methods might be slightly different, but rest assured they are all Entries. | ||
+ | ==Card Mapping Clients== | ||
+ | ===CardMapperClient=== | ||
+ | {{Client|CardMapperClient|mapreduce.apps.cards.client|main}} | ||
+ | |||
+ | {{CollapsibleCode|CardMapperClient| | ||
+ | <syntaxhighlight lang="java"> | ||
+ | Deck deck = Deck.createFull(); | ||
+ | CardMapper mapper = new CardMapper(); | ||
+ | List<Map.Entry<Suit, Integer>> keyValuePairs = mapper.map(deck); | ||
+ | keyValuePairs.forEach(kv -> { | ||
+ | System.out.println(kv); | ||
+ | }); | ||
+ | </syntaxhighlight>}} | ||
+ | |||
+ | {{CollapsibleConsole|CardMapperClient Output|<pre style="border: 0px; background: #000; color:#fff;">SPADES=10 | ||
+ | SPADES=9 | ||
+ | SPADES=8 | ||
+ | SPADES=7 | ||
+ | SPADES=6 | ||
+ | SPADES=5 | ||
+ | SPADES=4 | ||
+ | SPADES=3 | ||
+ | SPADES=2 | ||
+ | HEARTS=10 | ||
+ | HEARTS=9 | ||
+ | HEARTS=8 | ||
+ | HEARTS=7 | ||
+ | HEARTS=6 | ||
+ | HEARTS=5 | ||
+ | HEARTS=4 | ||
+ | HEARTS=3 | ||
+ | HEARTS=2 | ||
+ | DIAMONDS=10 | ||
+ | DIAMONDS=9 | ||
+ | DIAMONDS=8 | ||
+ | DIAMONDS=7 | ||
+ | DIAMONDS=6 | ||
+ | DIAMONDS=5 | ||
+ | DIAMONDS=4 | ||
+ | DIAMONDS=3 | ||
+ | DIAMONDS=2 | ||
+ | CLUBS=10 | ||
+ | CLUBS=9 | ||
+ | CLUBS=8 | ||
+ | CLUBS=7 | ||
+ | CLUBS=6 | ||
+ | CLUBS=5 | ||
+ | CLUBS=4 | ||
+ | CLUBS=3 | ||
+ | CLUBS=2</pre>}} | ||
+ | |||
+ | ===CardMapReduceClient=== | ||
+ | {{Client|CardMapReduceClient|mapreduce.apps.cards.client|main}} | ||
+ | |||
+ | {{CollapsibleCode|CardMapReduceClient| | ||
+ | <syntaxhighlight lang="java"> | ||
+ | Deck[] decks = { | ||
+ | Deck.createFull(), | ||
+ | Deck.createFull(), | ||
+ | Deck.createFull(), | ||
+ | Deck.createFull(), | ||
+ | }; | ||
+ | CardMapper mapper = new CardMapper(); | ||
+ | AccumulatorCombinerReducer<Integer, ?, Integer> accumulatorCombinerReducer = StreamUtils.summingIntAccumulatorCombinerReducer(); | ||
+ | MapReduceFramework<Deck, Suit, Integer, ?, Integer> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer); | ||
+ | Map<Suit, Integer> map = framework.mapReduceAll(decks); | ||
+ | map.entrySet().forEach(entry -> { | ||
+ | System.out.println(entry); | ||
+ | }); | ||
+ | </syntaxhighlight>}} | ||
+ | |||
+ | {{CollapsibleConsole|CardMapReduceClient Output|<pre style="border: 0px; background: #000; color:#fff;">HEARTS=216 | ||
+ | SPADES=216 | ||
+ | DIAMONDS=216 | ||
+ | CLUBS=216</pre>}} | ||
+ | |||
+ | ==Word Count Mapping Clients== | ||
+ | The word count mapping example clients use the beginning of [https://www.poetryfoundation.org/poems/46473/if--- If--- by Rudyard Kipling]. | ||
+ | ===WordCountMapperClient=== | ||
+ | {{Client|WordCountMapperClient|mapreduce.apps.wordcount.client|main}} | ||
+ | |||
+ | Passing: | ||
+ | |||
+ | new TextSection("If you can keep your head when all about you") | ||
+ | |||
+ | to the WordCountMapper's map method will return a list of Entries containing: | ||
+ | |||
+ | [[File:WordCount MapResult.svg|1200px]] | ||
+ | |||
+ | {{CollapsibleCode|WordCountMapperClient| | ||
+ | <syntaxhighlight lang="java"> | ||
+ | TextSection textSection = new TextSection("If you can keep your head when all about you"); | ||
+ | WordCountMapper mapper = new WordCountMapper(); | ||
+ | List<Map.Entry<String, Integer>> keyValuePairs = mapper.map(textSection); | ||
+ | keyValuePairs.forEach(kv -> { | ||
+ | System.out.println(kv); | ||
+ | }); | ||
+ | </syntaxhighlight>}} | ||
+ | |||
+ | {{CollapsibleConsole|WordCountMapperClient Output|<pre style="border: 0px; background: #000; color:#fff;">if=1 | ||
+ | you=1 | ||
+ | can=1 | ||
+ | keep=1 | ||
+ | your=1 | ||
+ | head=1 | ||
+ | when=1 | ||
+ | all=1 | ||
+ | about=1 | ||
+ | you=1</pre>}} | ||
+ | |||
+ | ===WordCountMapReduceClient=== | ||
+ | {{Client|WordCountMapReduceClient|mapreduce.apps.wordcount.client|main}} | ||
+ | |||
+ | {{CollapsibleCode|WordCountMapReduceClient| | ||
+ | <syntaxhighlight lang="java"> | ||
+ | TextSection[] textSections = { | ||
+ | new TextSection("If you can keep your head when all about you"), | ||
+ | new TextSection(" Are losing theirs and blaming it on you,"), | ||
+ | }; | ||
+ | WordCountMapper mapper = new WordCountMapper(); | ||
+ | AccumulatorCombinerReducer<Integer, ?, Integer> accumulatorCombinerReducer = StreamUtils.summingIntAccumulatorCombinerReducer(); | ||
+ | MapReduceFramework<TextSection, String, Integer, ?, Integer> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer); | ||
+ | Map<String, Integer> map = framework.mapReduceAll(textSections); | ||
+ | map.entrySet().forEach(entry -> { | ||
+ | System.out.println(entry); | ||
+ | }); | ||
+ | </syntaxhighlight>}} | ||
+ | |||
+ | {{CollapsibleConsole|WordCountMapReduceClient Output|<pre style="border: 0px; background: #000; color:#fff;">all=1 | ||
+ | theirs=1 | ||
+ | about=1 | ||
+ | it=1 | ||
+ | your=1 | ||
+ | when=1 | ||
+ | losing=1 | ||
+ | head=1 | ||
+ | can=1 | ||
+ | blaming=1 | ||
+ | are=1 | ||
+ | and=1 | ||
+ | keep=1 | ||
+ | if=1 | ||
+ | you=3 | ||
+ | on=1</pre>}} | ||
+ | |||
+ | ==K-mer Mapping Clients== | ||
+ | The word count mapping example clients use the beginning of [https://www.poetryfoundation.org/poems/46473/if--- If--- by Rudyard Kipling]. | ||
+ | ===KMerMapperClient=== | ||
+ | {{Client|KMerMapperClient|mapreduce.apps.wordcount.client|main}} | ||
+ | |||
+ | {{CollapsibleCode|KMerMapperClient| | ||
+ | <syntaxhighlight lang="java"> | ||
+ | byte[] sequence = "ACTCATGAG".getBytes(StandardCharsets.UTF_8); | ||
+ | KMerMapper mapper = new KMerMapper(3); | ||
+ | List<Map.Entry<String, Integer>> keyValuePairs = mapper.map(sequence); | ||
+ | keyValuePairs.forEach(kv -> { | ||
+ | System.out.println(kv); | ||
+ | }); | ||
+ | </syntaxhighlight>}} | ||
+ | |||
+ | {{CollapsibleConsole|KMerMapperClient Output|<pre style="border: 0px; background: #000; color:#fff;">ACT=1 | ||
+ | CTC=1 | ||
+ | TCA=1 | ||
+ | CAT=1 | ||
+ | ATG=1 | ||
+ | TGA=1 | ||
+ | GAG=1</pre>}} | ||
+ | |||
+ | ===KMerMapReduceClient=== | ||
+ | {{Client|KMerMapReduceClient|mapreduce.apps.wordcount.client|main}} | ||
+ | |||
+ | {{CollapsibleCode|KMerMapReduceClient| | ||
+ | <syntaxhighlight lang="java"> | ||
+ | byte[][] sequences = { | ||
+ | "ACTCATGAG".getBytes(StandardCharsets.UTF_8), | ||
+ | "CATGAAAAAA".getBytes(StandardCharsets.UTF_8), | ||
+ | }; | ||
+ | KMerMapper mapper = new KMerMapper(3); | ||
+ | AccumulatorCombinerReducer<Integer, ?, Integer> accumulatorCombinerReducer = StreamUtils.summingIntAccumulatorCombinerReducer(); | ||
+ | MapReduceFramework<byte[], String, Integer, ?, Integer> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer); | ||
+ | Map<String, Integer> map = framework.mapReduceAll(sequences); | ||
+ | map.entrySet().forEach(entry -> { | ||
+ | System.out.println(entry); | ||
+ | }); | ||
+ | </syntaxhighlight>}} | ||
+ | |||
+ | {{CollapsibleConsole|KMerMapReduceClient Output|<pre style="border: 0px; background: #000; color:#fff;">AAA=4 | ||
+ | ACT=1 | ||
+ | TCA=1 | ||
+ | CTC=1 | ||
+ | ATG=2 | ||
+ | GAA=1 | ||
+ | CAT=2 | ||
+ | GAG=1 | ||
+ | TGA=2</pre>}} | ||
=Code To Implement= | =Code To Implement= | ||
− | == | + | ==CardMapper== |
The specification for this mapper is outlined in the prep video: | The specification for this mapper is outlined in the prep video: | ||
+ | {{CollapsibleYouTube|MapReduce Tutorial|<youtube>K8VPHHPS3BQ</youtube>}} | ||
+ | {{CollapsibleYouTube|Learning MapReduce with Playing Cards|<youtube>bcjSe0xCHbE</youtube>}} | ||
+ | |||
+ | |||
+ | Non-numeric cards are considered to be bad data and ignored. Numeric cards should be emitted with their suit as the key and the numeric value as the value. Emitted key-value pairs are returned in a list of Entries. | ||
+ | |||
+ | {{CodeToImplement|CardMapper|map|mapreduce.app.cards.exercise}} | ||
+ | |||
+ | {{Sequential|List<Map.Entry<Suit, Integer>> map(Deck deck)}} | ||
+ | |||
+ | Make sure you have implemented [[Not_Thread_Safe_Hash_Table_Assignment#DefaultEntry.3CK.2CV.3E|DefaultEntry<K,V>]] before starting this assignment. | ||
+ | |||
+ | ==Word Count Mapper== | ||
− | + | {{CodeToImplement|WordCountMapper|map|mapreduce.apps.wordcount.exercise}} | |
− | + | {{Sequential|List<Map.Entry<String, Integer>> map(TextSection textSection)}} | |
− | + | Counting occurrences of words in text is a classic example of MapReduce. We will '''ignore any zero length words and convert the remaining words to lower-case''' so as to get a case insensitive count. Emitting each lower-cased word as the key with the value of 1 should do the trick here. | |
− | + | <!-- COMMENTED OUT count the number of times a word appears in a given text, using MapReduce. To accomplish this, you will need to create both the mapper and the reducer. Navigate to the <code>WordCountMapper.java</code> and <code>IntSumListAccumulatingReducer.java</code> classes. You will specifically define how the framework accomplishes the map and reduce methods. | |
− | + | The only method you will need to alter is the map method. | |
− | + | In this method, you need to record every instance of a given word and feed it into the keyValuePairConsumer. To do this, access all of the words in the TextSection and if the length of the word is greater than zero (meaning it is not just blank space), convert it into lower-case and accept it into the consumer. COMMENTED OUT--> | |
+ | Hint: Look at the methods in TextSection and the toLowerCase() method for strings for assistance. | ||
− | + | ==K-mer Count Mapper== | |
− | + | [http://www.csbio.unc.edu/mcmillan/Comp555S17/Lecture02.pdf K-mer counting is a useful technique in bioinformatics]. | |
− | + | Further background information on k-mer counting can be found [https://en.wikipedia.org/wiki/K-mer here]. | |
− | + | The 3-mers in the chromosome data: | |
− | + | <nowiki>ACTCATGAG</nowiki> | |
− | + | are: | |
− | |||
− | |||
− | + | <nowiki>ACT | |
− | + | CTC | |
+ | TCA | ||
+ | CAT | ||
+ | ATG | ||
+ | TGA | ||
+ | GAG</nowiki> | ||
− | + | The 4-mers in that same chromosome data: | |
− | + | <nowiki>ACTCATGAG</nowiki> | |
− | + | are: | |
− | + | <nowiki>ACTC | |
+ | CTCA | ||
+ | TCAT | ||
+ | CATG | ||
+ | ATGA | ||
+ | TGAG</nowiki> | ||
− | + | {{CodeToImplement|KMerMapper|map|mapreduce.apps.kmer.studio}} | |
− | + | {{Sequential|List<Map.Entry<String, Integer>> map(byte[] sequence)}} | |
− | {{ | ||
− | + | Take note of the private instance variable declared and initialized for you. | |
+ | Be sure to use the provided toStringKMer(sequence, offset, kMerLength) method to generate your k-mers: | ||
− | + | <syntaxhighlight lang="java">private static String toStringKMer(byte[] sequence, int offset, int kMerLength) { | |
+ | return new String(sequence, offset, kMerLength, StandardCharsets.UTF_8); | ||
+ | }</syntaxhighlight> | ||
− | + | This mapper is similar to the [[#Word Count Mapper]] except that the k-mers overlap with each other while words are separate. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | As the emitted values for each key will be later summed up in the reduction phase, what value makes sense to emit with each key? | |
=Testing Your Solution= | =Testing Your Solution= | ||
− | + | {{TestSuite|_MappersSuitableForPairingWithIntSummingReducerTestSuite|mapreduce.apps}} | |
− | + | ||
− | + | =Pledge, Acknowledgments, Citations= | |
− | {{TestSuite| | + | {{Pledge|map-reduce-mapper}} |
− | |||
− | |||
− | = | ||
− | {{ | ||
− | |||
− |
Latest revision as of 18:49, 20 February 2024
Contents
Motivation
In previous semesters the MapReduce exercise has proven to be the most challenging. We will start by building some Mappers on our way to the final boss.
Each of the Mappers built today can be pairs with an Int Summing AccumulatorCombinerReducer:
- a card mapper that matches the spec outlined in the prep video,
- a simple word counting mapper, and
- an analogous k-mer counting mapper.
Note: the k-mer counting mapper will prepare us for (and hopefully lessen the burden of) an exercise later in the semester.
Code To Use
Previous Exercise
Provided
CardMapper Utilities
Deck implements Iterable<Card>
- rank.numericValue() note: returns Optional<Integer>
WordCount Mapper Utilities
K-mer Mapper Utilities
toStringKMer(sequence, offset, kMerLength) |
---|
private static String toStringKMer(byte[] sequence, int offset, int kMerLength) {
return new String(sequence, offset, kMerLength, StandardCharsets.UTF_8);
}
|
Code To Invesitigate
Note: each of the clients print entries. The entries produced by the map methods of the Mappers are instances of DefaultEntry. The entries produced by the StreamFramework are instances of a different implementation of Entry. Their toString() methods might be slightly different, but rest assured they are all Entries.
Card Mapping Clients
CardMapperClient
class: | CardMapperClient.java | CLIENT |
package: | mapreduce.apps.cards.client | |
source folder: | student/src/main/java |
CardMapperClient |
---|
Deck deck = Deck.createFull();
CardMapper mapper = new CardMapper();
List<Map.Entry<Suit, Integer>> keyValuePairs = mapper.map(deck);
keyValuePairs.forEach(kv -> {
System.out.println(kv);
});
|
CardMapperClient Output |
---|
SPADES=10 SPADES=9 SPADES=8 SPADES=7 SPADES=6 SPADES=5 SPADES=4 SPADES=3 SPADES=2 HEARTS=10 HEARTS=9 HEARTS=8 HEARTS=7 HEARTS=6 HEARTS=5 HEARTS=4 HEARTS=3 HEARTS=2 DIAMONDS=10 DIAMONDS=9 DIAMONDS=8 DIAMONDS=7 DIAMONDS=6 DIAMONDS=5 DIAMONDS=4 DIAMONDS=3 DIAMONDS=2 CLUBS=10 CLUBS=9 CLUBS=8 CLUBS=7 CLUBS=6 CLUBS=5 CLUBS=4 CLUBS=3 CLUBS=2 |
CardMapReduceClient
class: | CardMapReduceClient.java | CLIENT |
package: | mapreduce.apps.cards.client | |
source folder: | student/src/main/java |
CardMapReduceClient |
---|
Deck[] decks = {
Deck.createFull(),
Deck.createFull(),
Deck.createFull(),
Deck.createFull(),
};
CardMapper mapper = new CardMapper();
AccumulatorCombinerReducer<Integer, ?, Integer> accumulatorCombinerReducer = StreamUtils.summingIntAccumulatorCombinerReducer();
MapReduceFramework<Deck, Suit, Integer, ?, Integer> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer);
Map<Suit, Integer> map = framework.mapReduceAll(decks);
map.entrySet().forEach(entry -> {
System.out.println(entry);
});
|
CardMapReduceClient Output |
---|
HEARTS=216 SPADES=216 DIAMONDS=216 CLUBS=216 |
Word Count Mapping Clients
The word count mapping example clients use the beginning of If--- by Rudyard Kipling.
WordCountMapperClient
class: | WordCountMapperClient.java | CLIENT |
package: | mapreduce.apps.wordcount.client | |
source folder: | student/src/main/java |
Passing:
new TextSection("If you can keep your head when all about you")
to the WordCountMapper's map method will return a list of Entries containing:
WordCountMapperClient |
---|
TextSection textSection = new TextSection("If you can keep your head when all about you");
WordCountMapper mapper = new WordCountMapper();
List<Map.Entry<String, Integer>> keyValuePairs = mapper.map(textSection);
keyValuePairs.forEach(kv -> {
System.out.println(kv);
});
|
WordCountMapperClient Output |
---|
if=1 you=1 can=1 keep=1 your=1 head=1 when=1 all=1 about=1 you=1 |
WordCountMapReduceClient
class: | WordCountMapReduceClient.java | CLIENT |
package: | mapreduce.apps.wordcount.client | |
source folder: | student/src/main/java |
WordCountMapReduceClient |
---|
TextSection[] textSections = {
new TextSection("If you can keep your head when all about you"),
new TextSection(" Are losing theirs and blaming it on you,"),
};
WordCountMapper mapper = new WordCountMapper();
AccumulatorCombinerReducer<Integer, ?, Integer> accumulatorCombinerReducer = StreamUtils.summingIntAccumulatorCombinerReducer();
MapReduceFramework<TextSection, String, Integer, ?, Integer> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer);
Map<String, Integer> map = framework.mapReduceAll(textSections);
map.entrySet().forEach(entry -> {
System.out.println(entry);
});
|
WordCountMapReduceClient Output |
---|
all=1 theirs=1 about=1 it=1 your=1 when=1 losing=1 head=1 can=1 blaming=1 are=1 and=1 keep=1 if=1 you=3 on=1 |
K-mer Mapping Clients
The word count mapping example clients use the beginning of If--- by Rudyard Kipling.
KMerMapperClient
class: | KMerMapperClient.java | CLIENT |
package: | mapreduce.apps.wordcount.client | |
source folder: | student/src/main/java |
KMerMapperClient |
---|
byte[] sequence = "ACTCATGAG".getBytes(StandardCharsets.UTF_8);
KMerMapper mapper = new KMerMapper(3);
List<Map.Entry<String, Integer>> keyValuePairs = mapper.map(sequence);
keyValuePairs.forEach(kv -> {
System.out.println(kv);
});
|
KMerMapperClient Output |
---|
ACT=1 CTC=1 TCA=1 CAT=1 ATG=1 TGA=1 GAG=1 |
KMerMapReduceClient
class: | KMerMapReduceClient.java | CLIENT |
package: | mapreduce.apps.wordcount.client | |
source folder: | student/src/main/java |
KMerMapReduceClient |
---|
byte[][] sequences = {
"ACTCATGAG".getBytes(StandardCharsets.UTF_8),
"CATGAAAAAA".getBytes(StandardCharsets.UTF_8),
};
KMerMapper mapper = new KMerMapper(3);
AccumulatorCombinerReducer<Integer, ?, Integer> accumulatorCombinerReducer = StreamUtils.summingIntAccumulatorCombinerReducer();
MapReduceFramework<byte[], String, Integer, ?, Integer> framework = new StreamMapReduceFramework<>(mapper, accumulatorCombinerReducer);
Map<String, Integer> map = framework.mapReduceAll(sequences);
map.entrySet().forEach(entry -> {
System.out.println(entry);
});
|
KMerMapReduceClient Output |
---|
AAA=4 ACT=1 TCA=1 CTC=1 ATG=2 GAA=1 CAT=2 GAG=1 TGA=2 |
Code To Implement
CardMapper
The specification for this mapper is outlined in the prep video:
Video: MapReduce Tutorial |
---|
Video: Learning MapReduce with Playing Cards |
---|
Non-numeric cards are considered to be bad data and ignored. Numeric cards should be emitted with their suit as the key and the numeric value as the value. Emitted key-value pairs are returned in a list of Entries.
class: | CardMapper.java | |
methods: | map | |
package: | mapreduce.app.cards.exercise | |
source folder: | student/src/main/java |
method: List<Map.Entry<Suit, Integer>> map(Deck deck)
(sequential implementation only)
Make sure you have implemented DefaultEntry<K,V> before starting this assignment.
Word Count Mapper
class: | WordCountMapper.java | |
methods: | map | |
package: | mapreduce.apps.wordcount.exercise | |
source folder: | student/src/main/java |
method: List<Map.Entry<String, Integer>> map(TextSection textSection)
(sequential implementation only)
Counting occurrences of words in text is a classic example of MapReduce. We will ignore any zero length words and convert the remaining words to lower-case so as to get a case insensitive count. Emitting each lower-cased word as the key with the value of 1 should do the trick here.
Hint: Look at the methods in TextSection and the toLowerCase() method for strings for assistance.
K-mer Count Mapper
K-mer counting is a useful technique in bioinformatics.
Further background information on k-mer counting can be found here.
The 3-mers in the chromosome data:
ACTCATGAG
are:
ACT CTC TCA CAT ATG TGA GAG
The 4-mers in that same chromosome data:
ACTCATGAG
are:
ACTC CTCA TCAT CATG ATGA TGAG
class: | KMerMapper.java | |
methods: | map | |
package: | mapreduce.apps.kmer.studio | |
source folder: | student/src/main/java |
method: List<Map.Entry<String, Integer>> map(byte[] sequence)
(sequential implementation only)
Take note of the private instance variable declared and initialized for you. Be sure to use the provided toStringKMer(sequence, offset, kMerLength) method to generate your k-mers:
private static String toStringKMer(byte[] sequence, int offset, int kMerLength) {
return new String(sequence, offset, kMerLength, StandardCharsets.UTF_8);
}
This mapper is similar to the #Word Count Mapper except that the k-mers overlap with each other while words are separate.
As the emitted values for each key will be later summed up in the reduction phase, what value makes sense to emit with each key?
Testing Your Solution
class: | _MappersSuitableForPairingWithIntSummingReducerTestSuite.java | |
package: | mapreduce.apps | |
source folder: | testing/src/test/java |
Pledge, Acknowledgments, Citations
file: | map-reduce-mapper-pledge-acknowledgments-citations.txt |
More info about the Honor Pledge