Abstract
MapReduce is a popular programming model currently used for application develop-
ment on large scale clusters. MapReduce realizes the concept of parallel programming
skeletons: The model describes the overall structure of a computation, the programmer
plugs in the low level problem-specific code that turns the generic description of the
problem to the final program and the runtime system completely hides the task manage-
ment and synchronization issues that make parallel programming complex and unreli-
able. The increasing scale of multi-core platforms stresses even the need for structured
parallel programming models like MapReduce in the development of shared-memory
applications. In this project, we implemented the MapReduce Model for Skandium,
which is Java-based algorithmic skeleton library that targets multi-core architectures.
After the skeleton was implemented we tested and tuned its performance for a selection
of typical MapReduce applications. Our objectives were two-fold: provide an abstract
and easy to use programming model for MapReduce and identify the main factors
that affect the skeleton’s performance on shared memory architectures and when it is
implemented on top of the Java platform.
i
Acknowledgements
First, I would like to thank my academic supervisor, Murray Cole, for his his invaluable
help and guidance throughout the project. I would also like to thank my family for their
constant support.
ii
Declaration
I declare that this thesis was composed by myself, that the work contained herein is
my own except where explicitly stated otherwise in the text, and that this work has not
been submitted for any other degree or professional qualification except as specified.
(Ioannis Assiouras)
iii
Table of Contents
1
Introduction
1
1.1
Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2
Related Background
5
2.1
Algorithmic Skeletons
. . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2
The MapReduce Skeleton . . . . . . . . . . . . . . . . . . . . . . . .
6
2.3
The Phoenix Implementation . . . . . . . . . . . . . . . . . . . . . .
7
2.4
The Skandium Library . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.4.1
The programming model . . . . . . . . . . . . . . . . . . . .
8
2.4.2
The runtime system . . . . . . . . . . . . . . . . . . . . . . .
9
2.4.3
The Skandium Map Skeleton . . . . . . . . . . . . . . . . . .
10
2.5
Typical MapReduce Applications . . . . . . . . . . . . . . . . . . . .
11
2.5.1
Word Count . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.5.2
Inverted Index
. . . . . . . . . . . . . . . . . . . . . . . . .
12
2.5.3
Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . .
12
2.5.4
Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.5.5
KMeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.6
Garbage Collection principles
. . . . . . . . . . . . . . . . . . . . .
14
3
MapReduce for Skandium: The programming model
17
3.1
The MapReduce integration into Skandium . . . . . . . . . . . . . .
17
3.2
The Programming Interface . . . . . . . . . . . . . . . . . . . . . . .
20
3.2.1
The Splitter . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.2.2
The Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.2.3
The Reducer . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.2.4
The Merge muscle . . . . . . . . . . . . . . . . . . . . . . .
23
iv
3.2.5
Creating the Skeleton’s Instance . . . . . . . . . . . . . . . .
24
4
MapReduce For Skandium: Implementation details
25
4.1
The Skeleton’s Instantiation
. . . . . . . . . . . . . . . . . . . . . .
25
4.2
Implementation of the generic muscles . . . . . . . . . . . . . . . . .
26
4.2.1
An initial Approach . . . . . . . . . . . . . . . . . . . . . . .
26
4.2.2
Parallelizing the Store muscle . . . . . . . . . . . . . . . . .
27
4.2.3
Using a caching technique to resolve collisions . . . . . . . .
29
4.2.4
Implementing Phoenix’s storing/partitioning scheme . . . . .
29
5
Performance Evaluation
33
5.1
Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . .
33
5.1.1
Shared Memory Systems . . . . . . . . . . . . . . . . . . . .
34
5.1.2
Applications . . . . . . . . . . . . . . . . . . . . . . . . . .
34
5.2
Evaluation of the four implementation schemes . . . . . . . . . . . .
35
5.3
Evaluation of the Phoenix Scheme . . . . . . . . . . . . . . . . . . .
37
5.4
Comparison to manual Java threading . . . . . . . . . . . . . . . . .
39
5.4.1
Implementation using manual threading . . . . . . . . . . . .
39
5.4.2
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
5.5
Evaluating the Garbage Collector’s impact . . . . . . . . . . . . . . .
44
6
Optimization of the MapReduce Skeleton
46
6.1
Hash Table Improvements
. . . . . . . . . . . . . . . . . . . . . . .
46
6.1.1
Choosing an optimal Hash Table Size . . . . . . . . . . . . .
46
6.1.2
Using a binary search tree for O log(n) store . . . . . . . . . .
49
6.1.3
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
6.2
Minimizing the Garbage Collector’s impact . . . . . . . . . . . . . .
51
6.2.1
An object reuse technique . . . . . . . . . . . . . . . . . . .
51
6.2.2
Tuning the Garbage Collector . . . . . . . . . . . . . . . . .
53
6.3
Auto Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
6.3.1
Auto-tuning for the MapReduce skeleton . . . . . . . . . . .
55
6.3.2
A simple Auto-tuning mechanism . . . . . . . . . . . . . . .
57
7
Conclusions and Future Work
59
7.1
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
Bibliography
62
v