Chapter 6. Optimization of the MapReduce Skeleton
50
not present, then the cost is O(1) plus O(k) to move all the combiners down to make
room for a new combiner. This scheme is inefficient for workloads that produce a big
number of pairs with a big percentage of distinct keys since we will frequently have
the case where the combiners are moved one position down to make room for the new
combiner. To resolve this problem we implemented a hash+tree data-structure similar
to the one implemented by Metis. The hash+tree data structure uses a binary tree in
each entry of the hash array rather than a sorted list. The binary tree supports storing
with O(logk) regardless of whether the pair is already stored in the tree or not.
Figure 6.5 shows the performance of word count for the hash+tree data structure
compare to the hash+sorted list.
Figure 6.5: The word count speedup with varying hash array size for a ”hash+tree” and
a ”hash+sorted list” data structure
6.1.3
Discussion
From the above results we can draw the conclusion that one of the factors that hurts
the performance of our implementation is its uniform intermediate storage of key-value
pairs. The key-value pairs are always store in a fixed-size hash table regardless of the
characteristics of the input and this implementation decision is inefficient for most
applications.
In order to achieve the best possible speedup for a given application, our imple-
mentation should provide a flexible, intermediate key-value storage abstraction that
Chapter 6. Optimization of the MapReduce Skeleton
51
permits workload-tailored implementations. Phoenix 2 [13] allows the user to specify
the width of the hash array, while Phoenix ++ [10] exposes the intermediate key-value
pair storage and allows the user to select one of the predefined implementations (i.e. a
variable width hash Table, a fixed size array, or an array that is shared by all threads)
or even define a new scheme.
On the contrary, we are more interested in an auto-tuning implementation of MapRe-
duce where the runtime system would automatically select the parameters that best fit
the workload characteristics. We will consider the skeleton’s auto-tuning in the last
section of this chapter.
6.2
Minimizing the Garbage Collector’s impact
In the previous chapter, we identified and quantified the garbage collector bottleneck
for the MapReduce implementation. In this section we will present two ways of re-
ducing the garbage collection penalty. The first is the implementation of an object
reuse technique that aims at reducing the live data of the MapReduce application. The
second is by tuning the garbage collector itself to best match the requirements of our
MapReduce implementation. We should note that all the results in this section refer
only to System1.
6.2.1
An object reuse technique
Our optimization is based on an object reuse technique for the emitted key objects.
We have added to our system an object pool where all the distinct keys are stored.
When a new key-value pair is created in the mapper, the system checks the key pool to
determine if the same key has been emitted before. If it has, the system uses a reference
to the old key for all subsequent operations on the pair. If the specific key is used for
the first time it is added to the pool.
The listing below shows the constructor of the KeyValuePair object after our addi-
tion of the key pool in the library.
p u b l i c
c l a s s
K e y V a l u e P a i r {
2
p u b l i c
s t a t i c
KeyPool k e y P o o l =new KeyPool ( ) ;
p u b l i c O b j e c t key = n u l l ;
4
p u b l i c O b j e c t v a l u e = n u l l ;
p u b l i c K e y V a l u e P a i r ( O b j e c t key , O b j e c t v a l u e ) {
6
t h i s . key = k e y P o o l . g e t C a n o n i c a l V e r s i o n ( key ) ;
Chapter 6. Optimization of the MapReduce Skeleton
52
t h i s . v a l u e = v a l u e ;
8
}
}
As we can see in line 6, the system removes the key reference from the newly
created by the user key, and points the reference to an equal key in the key pool. This
means that in the end, the pairs with the same keys will fall have their key references
pointing to the same object in the key pool. The object pool mechanism does not reduce
the key object allocations. The keys have already been allocated by the user code in
the mapper when their reference is passed in the constructor of the Key Value Pair
object. Instead, this optimization reduces significantly the live data on the heap since it
actually kills all the duplicate keys by removing the reference to them. Besides, since
the duplicate keys are killed shortly after their allocation, they die in the new generation
and are garbage collected very quickly as we described in chapter 2. The price we have
to pay though to use this mechanism, is the extra lookup for the existence of the key in
the key pool. This overhead increases as the number of emitted keys increases, and the
mechanism cannot show payback if most of the keys in the application are distinct.
Figure 6.6 shows the speedup of the word count and the inverted index applications
as the heap size shrinks when we are using the object pool technique. For the purposes
of comparison, we show the unoptimized version without the key pool in the same
figure.
Figure 6.6: The Speedup as the Heap Size decreases with and without the key pool for
word count (left) and inverted index (right)
The results show that the key pool achieves a slight increase in the speedup for
the two applications. For large heap sizes the key pool performs worse because of